According to Google's Data journalist open book in 2017, data journalism represents stories that are enriched by data, stories that use data to investigate and stories that explain data. Since 2009, the google trends shows that of the heat of the concept of data journalism has been significantly increasing.
However, what is a so-called data journalist? What should the data journalists be like in foreseeable future? By digging deep into Global Data Journalists Directory published online by Journalism++, We try to answer these 2 questions through a data-driven way.
Here is the comparison between the google trends of the concepts of data journalism and the GitHub contributions changed from 2008 of the data journalists living in the 5 most active countries in the data journalism industry.
It’s impressive that many of the peaks of the two charts share the same time points. Since the GitHub contribution represents the times that one uploads his/her programming work, it is obvious that there may be a correlation between programming skills and data journalism, which raised a question for us that whether data journalists equal to programmers or at least programming is an important skill for the data journalism industry.
What is the relationship between programming and data journalism, and whether programming is a key skill for a data journalist? This kind of trend will have a certain reference and instruction function for potential data journalists.
After crawling over 300,000 pieces of data written by journalists and reporters from different countries around the world through Github, we have classified their data according to different countries and regions, and calculated their contribution to Github, so as to obtain the total number of data news of a country or region on Github.
From the above picture, we can see that the colour distribution of different countries has different shades. The deeper the blue colour means the more data news contributed on Github by this region. Thus, we can clearly see that the United States is the country with the largest contribution, and European countries have also made a lot of contribution.
The Irish data scientist Behahrh Heravi has made a visual analysis of these educational institutions providing data-related journalism programs by region through the statistics of 219 objects around the world.
In this picture, the distribution of each dot represents the area where the university or institution offering data-related journalism courses is located, and the darker the dot shows the more schools offering data courses in the area. From this, we can see that the dots in the United States are not only widely distributed, but also dark in colour.
Judging from the contribution of data news on Github, the United States occupies a large proportion. Similarly, in terms of popularization and promotion of data journalism education, the United States is still in a leading position. We believe these conclusions can support the reason why the U.S. data news industry is also leading the way in the world.
Because there is sufficient and diverse educational environment, potential data journalists can learn and master the knowledge and skills they need before entering the industry, thus continuously increasing the proportion and competitiveness of data journalism in the United States. Therefore, we decided to study the situation of the United States in the field of data news in depth.
In this part, we analyze the US from three dimensions, geography, the number of data journalists and the contribution on their Github. The below picture is the number of data journalists in different cities in the US. Obviously, New York has most data journalists.
Then, we calculate the average contributions per person, which means divide the whole contributions by the number of data journalists. New York is hard to be found in this picture. Right? The layer wider means the more average contributions.
Then, we face questions: Why the number of DJ is highest in New York, but the average contribution is low than others? Is it a coincidence or maybe programming doesn’t have a strong relationship with DJ itself? we decided to conduct an identical crawling procedure to find out whether it will be the same result under a geographical scale towards the whole world.
From this map, we find out that even under a world scale, the large number of DJ does not mean a high average contribution, although the whole contribution is highest worldwide. Hence, we tend to believe that DJ and programming don’t strongly relate to each other as we think before — or programming is not the essential part for being a good DJ.
In order to understand what is important for data journalists, we analyze from Twitter and the job market. Firstly, we scraped the tweets of all the data journalists to understand what they were concerned about.
After scraping the frequent words of USA DJ from Twitter, the researchers find out that the most frequent word is ‘data’ and the rest is related to the content of hot topic in news such as trump, election and time, etc.
By scraping the job description of DJ at Indeed and Careers (both are job hunting website), it tends out the most frequent word is still ‘data’. However, this time you will figure out some specific words that meet the needs of companies to hire DJ with these relative skills. Put the word frequency chart from Twitter and word frequency chart from job description into comparison, both of the carts emphasize the abilities of storytelling.
What’s more, the basic skills of analysis and using tools just has been mentioned in a quite few times. However, it is hard to say what the word ‘data’ really means in both charts. It could simply focus on using data to prove the content of certain news or focus the measures that DJ use to collect data.
By categorizing frequent words into a different dimension, it concludes that to become a data journalist, there are some important skills and abilities one should acquire. On one hand, the frequent words of Twitter have two group which shows the most common topic of what data journalist have been talking about in daily life. On the other hand, the frequent words of job description show the requirement from the company that was looking for data journalist employees. To have a better visualization of the information, you can refer to the following talent portrait.
You can see that there are basically three parts of abilities a data journalist should obtain. The data part requires a general understanding and application towards using and collecting data. The storytelling part requires a good expression and sensibility of stories and news. The experience part requires a past participation in research, teamwork and communication with news agency or institute. Last but not least, a curiosity toward new things, caring news and a relative degree or diploma will also be considered as an important quality of becoming a good data journalist.
Online News Association
indeed
jobsDB
Global Data Journalists Directory
Where in the world can I study data journalism?