Researchers from Portugal and the US are working on an algorithm designed to perform ‘computational fact-checking’ of information posted on the Internet.

Automating Fact-Checking via Knowledge Map and Algorithm

A fact-checking algorithm could very quickly become the journalist’s best friend, given the pressing need to get ‘scoop’ stories written at high speed while coping with the non-stop flow of online information. Now a paper entitled ‘Computational Fact Checking from Knowledge Networks’, written by researchers from the Center for Complex Networks and Systems Research at the School of Computing, Indiana University, and the Instituto Gulbenkian de Ciencia in Oeiras, Portugal, indicates that this is far from being a totally unrealistic proposition. In the paper, the team of six researchers describe an approach to the kind of tool that could be used to help sift through the mass of information whizzing around on the web and sort out the true from the false. The idea is that the system could save you time and effort by deleting from your search results any that are erroneous, deliberately misleading or simply zany.

The principle behind the algorithm is basically quite simple: it involves using ‘knowledge graphs’ rather like those produced by Google. The ‘computational fact-checking’ system will break a given statement down into three pieces: subject, predicate, object. The researchers take as an example ‘Socrates is a person’. Their grammatical and semantic analysis of this particular sentence could be questioned but their approach is clear: studying the links between the subject and the object or complement of the sentence, which they call ‘entities’. Each entity will constitute a node on the graph being created and the nodes are linked by a variety of different verbs. They refer to the lines connecting the nodes as ‘edges’. When all this information is put together we have a ‘knowledge graph’. The researchers have applied this approach to a number of topics using the Wikipedia knowledge repository. In order to determine whether or not a sentence such as ‘Barack Obama is a Muslim’ is accurate, their procedure establishes a link between the subject (Obama) and the complement (Muslim). The principle is that the longer and more complex the path between the two component ‘entities’ of the sentence, the less likely it is to be true, which is exactly the case with this example.

This approach, if it really can be made to work, will make the task of fact-checking much easier for a journalist or writer. Based on a ‘knowledge graph’, the system would assess just how direct the link is between the subject and the object or complement. Some commentators even envisage being able to install a plug-in on the Chrome browser which would automatically use this kind of algorithm to sort non-verified search results, putting the most trusted links at the top of the page. Everyone searching the Internet for information would then be able to apply the standards demanded of a professional journalist. Meanwhile, part of a journalist’s work would become automated, which would of course have some impact on the profession. We have already seen moves towards ‘crowd-checking’ but the plug-in fact-checker would be a further step towards automated sorting of online information. Meanwhile the researchers claim that their “findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.

”Interesting though this research undoubtedly is, the algorithm is still likely to run up against a major problem which besets many artificial language projects: language traps, including irony, metaphor, hyperbole, metonymy, and many other modes of expression – in short, the kind of nuance that a mathematical model might find it virtually impossible to deal with. It remains to be seen therefore whether this knowledge map analysis approach – which the researchers have so far only tested on the supposedly trustworthy Wikipedia – can in fact be applied more generally to information posted across the wide expanses of the Internet.

By Guillaume Scifo