Jul 15, 2014

3 Great Examples of Data Journalism Stories

Over the last month I've been spending part of my free time learning about an emerging discipline in the areas of data analytics: Data Journalism. I am doing it, firstly because I find the combination "data analysis + journalism" very fascinating, but also because, as a Web Analyst I believe that there are some very important skills I can absorb from Data Journalists (here is a post where I talk about Web Analyst skills).

The aim of this post is to introduce you to this emerging discipline, and show you a couple of practical examples of data journalism. To do so, I've selected 3 published data journalism stories and analysed each of them by answering four key questions:
  1. What does the story do?
  2. How was it created (methodology)? 
  3. How was it illustrated?
  4. What technologies were used to create and present the story to readers?   

If, after reading this post, you feel you like to know more about data journalism, I encourage you to sign-up to this mooc: Doing Journalism with Data, offered by the European Journalism Centre through the Canvas network platform. The course actually started in May and material will be available until next 30 of July, so you still have sometime.

But first, of course, keep scrolling down this post and get started here with Data Journalism!

What is Data Journalism?

To put it very very simple, Data Journalism is Journalism done with Data.

I know this is not a very helpful definition, and the reality is that if we asked different journalists for a definition of data journalism, we would probably get several different answers.

To make things a bit clearer, Simon Rogers (Data Editor @ Twitter) looks at some key aspects of doing journalism with data. To define it, he suggests that data journalism is about:
  • telling stories with numbers
  • finding the best way to tell this story
  • the techniques with which you tell the story (which keep changing all the time)
But, hold on. Is data jounalism a totally new discipline? Not really. Actually, data have always been at the base of stories. In some areas like sports for example, data have always been an essential part of the piece of work to deliver to the reader.

Then, what happened over the last years that data journalism is emerging so rapidly?

1) First of all, more and more data are becoming accessible to everyone, as never before. The world of media can now look for data to create their stories from several different sources such as: public databases about government spending, leaked documents published by Wikileaks or "big data" generated by social networks such as Facebook or Twitter for example;

2) Secondly, data analysis tools are available to most of us. Who has not an Excel copy (or equivalent spreadsheet software) installed on his computer? Some very powerful tools are even free or open source. Think of R statistical programming language.

4) Last, data analysis tools are much more powerful and easier to use than in the past. In many cases you can produce beautiful visualizations or maps without having technical experience or coding knowledge.

Because of the above changes, the journalism field is under siege and so are the skills needed for "new journalists". Indeed, journalists will have to become knowledgeable in searching, cleaning, processing, analysing and visualizing data. They will have to mine the data, making sense of it and turn it into something interesting for the reader.

Finally, we can try to group the main activities of a data journalists (or set of skills needed) into 4 categories:
  1. Finding data to support stories
  2. Analyse the data to discover potential stories
  3. Clean the data
  4. Tell stories through visualizations

Some pionering journalists and newspapers are already demonstrating how data can deliver unique insights of what is happening around us. And they are creating very interesting stories from that. Here below I am going to show you a couple of examples of published pieces of data journalism, and I will try to analyse each of them through the simple schema mentioned above.  

Three Great Data Journalism Stories

Story #1. Afghanistan War Logs: a Selection of Significant Accidents

Published by: The Guardian Data Blog

afghan accidents data journalism

What does the story do?
This great data story created and published by The Guardian, shows us a selection of key events happened during the war in Afghanistan, such as coalition forces attacks on civilians, friendly fire incidents (coalition troops mistakenly firing on each others), and Afghan forces attacking each others. To achieve that, The Guardian uses data that have become available through Wikileaks, which discloses previously military confidential facts about the war in Afghanistan.

How was it created?
What we are talking about here is one of the biggest leaks in intelligence history. The Guardian got a huge Excel file from Wikileaks, logging the history of the war in Afghanistan. The Excel file contained over 90,000 rows data, some of which of course had nothing in it or poor formatting.

The data obviously needed some cleaning. And Excel showed its limits processing such a huge amount of data. Reporters could not access the data easily, hence it was hard to extract meaningful stories. What the Guardian data team did, was building an internal database to store and access these data, so that reporters could now look for stories, by using keywords and events. One of the key stories found from the war data, was the rise in the use of IED (improvise explosive devices) attacks.

They then mapped latitudes and longitudes coordinates of every event, made a selection of key events to include in the story, and finally created a graphic visualization with the help of Google Maps.

How was it illustrated?
The story was visualized by plotting points (key events) of different colours on a geographical map. Using Google Map in this case. As you can see on the picture above, colours identify the category of event (Afghan friendly fire vs Coalition friendly fire, etc.).

If you click on the categories within the map legend above, you can hide or show them in the map. Also, if you click on any data point within the map, a small window will open showing a brief description of the event, the category as well as the data and time it occured. By clicking on "Read the full log entry" you will be able to see the complete log of that event.

I recommend you explore the Afghan War Logs map yourself and play a bit with the data.

What technologies were used to create and present the story to readers?
Wikileaks data were recorded in a spreadsheet, about 92,000 notes, then the team built a simple database and interrogated with SQL. Finally, I guess they used Google Maps API to produce the map.

Story #2. Weapons from Croatia Spread through the Conflict in Syria

Published by: final article was published by The New York Times, though the story was originally created by Eliot Higgins, a.ka. Brown Moses

Weapons Smuggled into Syria - Data Journalism Story

What does the story do?
The story reveals as, at sometime during the war in Syria, some very unusual weapons appeared in the conflict. Apparently all coming from former Yugoslavia. The insights discovered, led the journalist make a very important conclusion from the data: the Saudis had purchased those weapons from Croatia, shipped them to Jordan, and started smuggling into Syria to support the free Syrian Army in their fight against President Assad. All of this probably happening with the knowledge of the US Government.
How was it created?
All began in 2012 when, an unemployed finance worker named Eliot Higgins, started a blog about the Syrian civil war. As he said, his early posts were rather unorganised collections of videos he had seen on Facebook and Twitter. But after a couple of months, he started adopting a more systematic approach to collect and examine videos coming from Syria.

What he did, was gathering a list of all channels that were posting from each specific area of Syria. Her ended up monitoring a list of over 500 Youtube channels daily, searching for images of weapons and tracking when new types of weapons appeared in the conflict, where , and with which army group. To find out the type of weapons, he mainly relied on Google. He eventually collected all this data into a spreadsheet and analysed it.

Once he noticed there might have been an important sotry behind those data (unusual weapons mainly appearing in the southern region of Syria and near to the border with Jordan), he first published his findings on his blog, and later wen to the New York Times with an article summarising what he found.

The New York Times did further investigations and eventually published the article.

A very interesting point made by Elliot Higgins, is about the effectiveness of his data collection methodology. By monitoring Social Media he was able to track the arrival of those unusual weapons in Syria, which is something he might not have picked up staying on the ground. For this type of analysis, he had a much better picture of what was going on in Syria, than a journalist based locally. If you are interested in this fascinating story, you must read Higgins story on School of Data or just google it.

How was it illustrated?
I have not found published visualizations about the findings. The story published on the New York is only text based, though it references to Brown Moses blog. Here you can find various pictures and videos reporting the conflict in Syria and the weapons discovered.

As a personal thought, it would be very interesting to summarize and show the story throughout some type of visualization. An idea, could be a geographical map, with points plotted over it indicating the different type of arms used in the conflict; and also a timeline showing when Croatian weapons started spreading throughout the conflict. Any other idea?

What technologies were used to create and present the story?
He regularly monitored images and videos from social media like Facebook and Twitter, and collected all the data into a spreadsheet, I guess Excel.

Story #3. The Cholera Map of John Snow (1854)

Published by: John Snow in 1854
Article link: I am not sure if the original publication is available on the internet, however his amazing work has been largely discussed by many experts in the field of data visualization and journalism. I recommend you to read the interesting article from The Guardian, where they also recreate John Snow story with an interactive map. Here below is the original cholera map.

Jogh Snow Cholera Map - Data Journalism Story

What does the story do?
Until the 1870s, most scientists believed that cholera, as many other sicknesses, were caught by breathing "polluted" air. In 1854, a severe oubreak of cholera occured in the London district of Soho, was the occasion for the physician John Snow to study the phenomenon closely, and defy the dominant theory by hypothesizing that cholera was a waterbone disease caused by germs.

He concluded that the source of the London outbreak was the public water pump located on Broad Street (now called Broadwick Street). So, the cholera spreaded by contaminated water and not by "polluted" air as most scientists believed.

How was it created?
John Snow talked to local residents, found out where the cholera cases happened and collected all data. He then made a map of all the cases, representing each death as a bar and locating in the map on the exact point where it happened. Thanks to the map, Snow could show his story behind those data: most of cholera deaths clustered around the Broad Street pump, which (as later was discovered) had been contaminated by fecal matter from a sick baby.

How was it illustrated?
With bars on a map, representing the amount of cholera cases in each areas of London Soho. And circles showing the locations of water pumps.

This type of data visualization, allowed him to relate the two variables: pumps locations and number of cholera cases.

What technologies were used to create and show the story?
At that time, I guess Snow produced the map with just a pen and paper.

As I mentioned earlier, his data story has been revisited recently by The Guardian, who recreated the cholera map using modern mapping tools such as CartoDB and Stamen style maps  .

For more about the impact of Snow story on current data journalism and data visualization, you can check the very interesting posts from Simon Rogers and Alberto Cairo

Conclusions: What can we Learn from Data Journalism?

In this post I introduced the emerging field of data journalism and showed with 3 examples, how some pionering journalists are analysing data to find insightful stories.

The technologies used to create and present stories were more or less sophisticated. Still, all journalists followed a common process to build the story: they had to find data, analyze it, clean it, and finally communicate the story possibly using some graphic visualization.

As a Web Analyst, I think that looking at how journalists are creating stories with data is a great learning excercise. It encourages us being more creative, curious and developing a good critical actitude which will help us in our daily job. And very important, it should make us realising that we share many aspects of our job with data journalists. Indeed, both of us follow the same process in building the story (find, analyse, clean, visualize) and have to communicate the final results to someone (who sometimes is new to the subject).

Okay, I guess in many cases our job (web analysis) is more "standardized" than a journalist, in the sense that we tend to stick to the same sources of data, tools and techniques to collect, analyse and visualize data. But still, this should not stop us thinking outside the box and see if there is a better way to solve our data problems. For example, we might start asking ourselves questions like:

How could we collect more interesting data to perform our web analysis? Is there any other available data that we could combine together with our website clickstream data, so that we will be able to make better decisions? Think at the Syria story and how Elliot Higgins started monitoring social media to get insights about the conflict going on in Syria. Would he had been able to get the same data by staying on the ground? Probably not.

And finally, what would be the best graphic format for our monthly/weekly report? Could we replace current tables/charts with more insightful graphich visualizations? Sometimes it can be a great idea to just grab pen and paper and do a sketch of what we want to show with the data. John Snow cholera map was an excellent example of insightful data visualizations.

See you next post! Thanks for reading it.


  1. Thank you, I have just been searching for info about this topic for ages and yours is the greatest I have discovered till now. But, what about the bottom line? Are you sure about the source? digital marketing strategy

  2. Good read, thanks - shame on the spammy commentors :/

  3. You could definitely see your enthusiasm within the paintings you write. The arena hopes for more passionate writers like you who are not afraid to mention how they believe. All the time go after your heart.seo services

  4. Once I initially commented I clicked the -Notify me when new comments are added- checkbox and now each time a remark is added I get four emails with the identical comment. Is there any way you possibly can take away me from that service? Thanks! Hiring a SEO agency in Singapore

  5. A Data Scientist is an expert who is gifted in mining concealed data behind the information and who can misuse the information to create wanted outcomes utilizing a blend of different instruments, calculations and AI standards.
    ExcelR Data Science Courses

  6. You should rest assured that Yoyo Cao is a fashion Influencer. MediaOne would hire their services for influencing the fashionable people in the best manner possible. The role of the influencer would be to allure the targeted audience to become potential customer.

  7. I do not even know how I finished up here, however I believed this put up was good. I don't know who you might be however definitely you are going to a well-known blogger in case you aren't already ;) Cheers! Duplex Plate

  8. you employ a wonderful weblog here! do you need to cook some invite posts on my small blog? putlocker

  9. Hey, thanks for this great article I really like this post and I love your blog and also Check Python course Training in 360DIGITMG. Python Training certification program provides an overview of how Python and R programming can be employed in Data Mining of structured (RDBMS) and unstructured (Big Data) data. Comprehend the concepts of Data Preparation, Data Cleansing and Exploratory Data Analysis. Perform Text Mining to enable Customer Sentiment Analysis. Learn Machine learning and developing Machine Learning Algorithms for predictive modeling using Regression Analysis. Assimilate various black-box techniques like Neural Networks, SVM and present your findings with attractive Data Visualization techniques.
    360Digitmg Python Training institute

  10. Hey, thanks for this great article I really like this post and I love your blog and also Check machine learning course in hyderabad at 360DIGITMG.
    360Digitmg machine learning course in hyderabad

  11. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    Data-science course in chennai

  12. Hello, I have browsed most of your posts. This post is probably where I got the most useful information for my research. Thanks for posting, we can see more on this. Are you aware of any other websites on this subject.
    360Digitmg Python Training institute

  13. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
    360 digitmg python-training -in -hyderabad-

  14. I'm glad I found this web site, I couldn't find any knowledge on this matter prior to.Also operate a site and if you are ever interested in doing some visitor writing for me if possible feel free to let me know, im always look for people to check out my web site. Data Blending in Tableau

  15. This comment has been removed by the author.

  16. It is perfect time to make some plans for the future and it is time to be happy. I’ve read this post and if I could I desire to suggest you few interesting things or tips. Perhaps you could write next articles referring to this article. I want to read more things about it!
    Data Science Training in Bangalore

  17. It is perfect time to make some plans for the future and it is time to be happy. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
    Data Science Course in Bangalore

  18. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
    Data Science Certification in Bangalore

  19. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

    Data Science Course

  20. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

    Data Science Training

  21. It is perfect time to make some plans for the future and it is time to be happy. I’ve read this post and if I could I desire to suggest you few interesting things or tips. Perhaps you could write next articles referring to this article. I want to read more things about it!
    data science certification

  22. You are in point of fact a just right webmaster. The website loading speed is amazing. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you have done a fantastic activity on this subject!
    Business Analytics Course in Hyderabad | Business Analytics Training in Hyderabad

  23. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
    Data Science Training in Hyderabad | Data Science Course in Hyderabad

  24. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
    Data Science Course in Pune
    Data Science Training in Pune

  25. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    data science course in guwahati

  26. Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

    Data Science In Banglore With Placements
    Data Science Course In Bangalore
    Data Science Training In Bangalore
    Best Data Science Courses In Bangalore
    Data Science Institute In Bangalore

    Thank you..

  27. Such a very useful article. Very interesting to read this article. I would like to thank you for the efforts you had made for writing this awesome article.
    Data Science Course in Pune
    Data Science Training in Pune

  28. Nice blog. I finally found great post here Very interesting to read this article and very pleased to find this site. Great work!
    Data Science Training in Pune
    Data Science Course in Pune

  29. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    Data Analytics Course in Pune
    Data Analytics Training in Pune

  30. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
    data science institute in hyderabad
    data science training
    data science course

  31. Really impressed! Everything is very open and very clear clarification of issues. It contains truly facts. Your website is very valuable. Thanks for sharing.
    360DigiTMG data science course
    business analytics course
    data analytics course

  32. Thumbs up guys your doing a really good job. It is the intent to provide valuable information and best practices, including an understanding of the regulatory process.
    Cyber Security Course in Bangalore

  33. Very nice blog and articles. I am really very happy to visit your blog. Now I am found which I actually want. I check your blog everyday and try to learn something from your blog. Thank you and waiting for your new post.
    Cyber Security Training in Bangalore

  34. I am impressed by the information that you have on this blog. Thanks for Sharing
    Ethical Hacking in Bangalore

  35. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
    Ethical Hacking Course in Bangalore

  36. Wow! Such an amazing and helpful post this is. I really really love it. I hope that you continue to do your work like this in the future also.
    Ethical Hacking Training in Bangalore

  37. Here at this site really the fastidious material collection so that everybody can enjoy a lot.

    Data Science Course

  38. Your work is very good and I appreciate you and hopping for some more informative posts.

    Data Science Training