Should we be afraid of robot journalism? Belgian journalist, Laurence Dierickx, has just dedicated her thesis on this question at the Université libre de Bruxelles (ULB). The European Federation of Journalists (EFJ) asked her to share the result of her research which highlights the ability of robots in producing objective, accurate and comprehensive news stories, however, they may not be as readable those written by real journalists. Here is her article and we guarantee you that this one was not written by a robot.
Automatically generated content in the natural language (General Architecture for Text Engineering), which is a branch of the natural language processing (NLP), has been widely deployed in Europe when the French departmental elections took place on 22 March 2015. During the election coverage, more than 30,000 has been produced by robots by Le Monde in partnership with the Parisian start-up Syllabs. In October 2014, the AFP, through the German company Sport Informations Dienst, started to automatically generate sport news into thirteen languages. Before that, the Associated Press signed a partnership with Automated Insights in order to automate the writing of quarterly balance sheets for companies.
For the media industry, companies specialising in automated generated content technologies are in high demand at a time when media companies are in need of original content but at a lower cost. In addition, these companies can process a huge amount of data in record time.
However, there are many questions that have not been answered in relation to the economic crisis and the impact on jobs. How will this impact on the working conditions of journalists? What about journalistic ethics and quality of information? According to some industry observers, the fear for robot journalism should considered together with the benefits it brings.
Are robots a threat or an opportunity for journalists?
“I don’t think it can be a threat to journalists. The work of journalists cannot be easily automatised. No machine will ever be able to obtain the same relationship as journalists have with their sources. It means that the human relationship remains an essential data,” said Ricardo Gutiérrez, the General Secretary of the European Federation of Journalists (EFJ).
“Those whose work is based on the results in finance and sport are mostly threatened. But it is not as serious because the added value of human input is low. I think that on the contrary the arrival of robots to fulfil the most unpleasant tasks can help journalists and allow them to do more interesting things that have more added value. These tasks are more intelligent, more prestigious and more significant in their role in the society. It is one tool more,” said Eric Scherer, Director of the future projects at France Télévisions.
The fonder of Journalism ++, Nicolas Kayser-Bril, added,
“Like any innovation, the technology that appear in the world of journalists will give rise to tensions. We do invent new tools to allow journalists to work on tasks with strong added value. However, it is up to the employer to decide replace journalists with robots. Today, the skills that are in demand and provide added values are not the same as thirty years ago or even five years ago. It means that the need for adaptation is necessary. But nothing of this is new. The manual composition was essential since the 16th century and was no longer useful by the end of the 19th century.”
The AGT technology is participating at what Uricchio calls the “algorithmic turn.” A turn that is covering the whole channel of journalistic production: from the detection of breaking news to the personalised publication, passing by the selection of the piece of information and the priorities of the contents. The automatisation of tasks and the process does not happen without human being intervention. In fact, someone created the algorithm at the first place. “Algorithms lead to editorial choices which always existed in journalism,” analyzed Paul Bradshaw, a British journalist and expert in new media.
What is our perception of automated content?
Because this topic is relatively recent in the journalistic sphere, little research has been done to understand the perception of automatically generated articles. In Sweden, Clerwall studied the perception of this kind of articles related to sports among students in journalism. In the Netherlands, Krahmer and van der Kaa explored the point of credibility of generated content by focusing on the differences and similarities between articles written by journalists and automatically generated articles. While the samples and the corpus are limited and these two experiences have not allowed us to state general rules, however, the research so far suggested that there is little difference between automated content and content written by real journalists.
In the framework of a thesis realised at the Université libre de Bruxelles (ULB) under the supervision of Seth van Hooland, my research was focused on how generated content is perceived by journalists themselves. In this case, evaluation methods from the computational linguistics area were used. First, a corpus of twenty automatically generated texts (in English) from on finance and economics was given to a series of automatic metric evaluations. The goal was to analyse the correction of the language. During the first examination, automated content quite often obtained a higher score of readability compared to the articles written by journalists.
Then the three best texts which obtained the highest scores were given to professional writers such as journalists, editors, copywriters, etc. The aim was to evaluate their perception as for the quality of the contents. Those experts were requested via several professional organizations in the United Kingdom and the United States. 80 people participated in this evaluation for which the methodology had been inspired by the one of Clerwall. 75 of them completed it. In order to obtain their neutrality, participants were not informed about the automated content.
In the first round, they were invited to express themselves about the perceived quality about those texts as for the twelve descriptors. The three criteria which received the best average score were: the objectivity (68.46%), the accuracy (65.69%) and the comprehensiveness (65.17%). The worst average scores implied the criteria of reading enjoyment (51.52%), interest (51.51%) and writing quality (60.39%). What is more, they have the possibility to not answer and if they wanted to comment their no-answers. In this regard, it is interesting to mention that critics concerned more sources described as not reliable or insufficient (32.9%).
In the second round, evaluators were asked to give their opinion on the nature of the author of the texts. Is the writer a human being or software? 52% of the participants recognized those texts as being written by a human being. However, this result must be shaded according to the sub-group to which they belong to. Journalists were those who seem to be the more skeptical.
Laurence Dierickx is a freelance journalist and developer based in Belgium, working as professional for more than fifteen years. She collaborates with small newsrooms in her native country as well as for an independent media in Burundi, Iwacu. She designs news information websites, news applications that automated data for an investigative purpose and she develops storytelling tools for journalists. She teaches data journalism and multimedia storytelling at the IUT of Lannion (Rennes University, France). She is also PhD student at the Brussels’ University (ULB), where she is researching about the relationships between journalists and algorithmic programs through the lens of uses. Her approach is transdisciplinary and shaped by her hybrid professional background.
Photo: (c) The People Speak!