In early July Google announced that it was setting up a new research group to work on artificial intelligence (AI). The purpose of this initiative, dubbed PAIR, an abbreviation of People + AI Research, is to bring Google’s researchers and resources together to study and redesign the ways people interact with AI systems, focusing on the human side of AI. PAIR research is divided into three areas, based on different user needs. The first area is about engineers and researchers who design AI systems. The PAIR team aims to make it easier for them to build and understand machine learning systems. The second area is targeted at ‘domain experts’, i.e. looking at how AI can help professionals in their work. Here Google cites doctors, technicians, designers, farmers, and musicians. Thirdly, everyday users. Here Google is basically saying: “How might we ensure that machine learning becomes inclusive, so that everyone can benefit from breakthroughs in AI? Can design thinking open up entirely new AI applications? Can we democratise the technology behind AI?” In addition to publishing its research, PAIR promises to create educational materials and open source new tools. Google has already matched its stated intentions with action – accompanying the launch of PAIR with the open sourcing of two visualisation tools that offer AI engineers a clear view of the data they use to train AI systems.

  • 4 min

This is not the Mountain View-based Internet giant’s first move to open source its code. In 2015, Google decided to open up its own ‘open-source software library for Machine Intelligence’ – TensorFlow, which is a system for building and training neural networks to detect and decipher patterns and correlations in ways similar to the functioning of the human brain. These patterns are for example used in deep learning techniques and have been responsible for most of the current progress in AI. Initially a number of people remarked that Google’s policy of developing AI purely for internal use was overly rigid. As Cade Metz wrote in iconic technology magazine Wired: “Typically, [Google] didn’t share its designs with the rest of the world until it had moved on to other designs. Even then, it merely shared research papers describing its tech. The company didn’t open source its code. That’s how it kept an advantage. With TensorFlow, however, the company has changed tack, freely sharing some of its newest – and, indeed, most important – software. Yes, Google open sources parts of its Android mobile operating system and so many other smaller software projects. But this is different. In releasing TensorFlow, Google is open sourcing software that sits at the heart of its empire.

elon musk's openai

Google is far from being the only company to favour the open AI approach. In the world of artificial intelligence the most famous initiative is without doubt OpenAI, a non-profit artificial intelligence research company founded by Elon Musk and Sam Altman, whose mission is to promote and develop AI in such a way that it will benefit humanity as a whole. OpenAI shares all its research results with the general public. There is also the Partnership on Artificial Intelligence to Benefit People and Society, an organisation which brings together the heavyweights of the industry – Apple, Google, Amazon, Microsoft and IBM – to focus on open source code in order to maximise the benefits of AI for the largest possible number of people.

Greater willingness to hand over algorithms than data

The ‘open source’ approach therefore appears to have the wind in its sails among the AI community. However, this rather fuzzy term may refer to either of two very different realities. Opening up code which governs the functioning of AI algorithms is not the same thing as opening up data that enables developers to train the algorithms. Companies tend to be more willing to open up their code rather than their data. “Many companies have realised that algorithms will be developed with or without their involvement,” says Zachary Chase Lipton, a researcher specialising in artificial intelligence who writes a blog called Approximately Correct.

"Many companies have realised that algorithms will be developed with or without their involvement"

So it’s very much in their interests to publish their research in order to profile themselves as leaders. However, it may prove hard to get hold of their data. When for instance Microsoft and Google train their voice recognition systems, they use proprietary databases which are much larger than the ones they pass on to the general public.” Of course, a desire to keep one step ahead of the competition is not the only argument for not opening up data. When it comes for example to medical data, or user data sourced from the Internet, making this information public may mean compromising people’s data privacy. Lukas Biewald, CEO of San Francisco startup CrowdFlower – which processes data for third parties using artificial intelligence – shares Lipton’s view.

An expert eye

Lukas Biewald 

CEO, CrowdFlower

Google can safely open-source their core technology because without training data for their algorithms, you can’t build a search algorithm anywhere near as good as Google’s 

Google can safely open-source their core technology because without training data for their algorithms, you can’t build a search algorithm anywhere near as good as Google’s,” writes Biewald, adding: And Google knows that no one can build a training dataset as good as they have. By open-sourcing their algorithm they can count on the whole world to help (…) but by keeping their data, they keep a large moat between them and their competitors.” So Biewald is basically arguing that data is the key to competition. “A company’s intellectual property and its competitive advantages are moving from their proprietary technology and algorithms to their proprietary data. As data becomes a more and more critical asset and algorithms less and less important, we can expect lots of companies to open source more and more of their algorithms,” he predicts. Recent events seem to support this view, or at least confirm the appetite new technology companies have for data. A report on Machine Learning (a form of artificial intelligence that allows computer systems to learn from examples, data, and experience) published in April by the Royal Society, a sort of Academy of Sciences in the UK, supports this view. The report, to which Google DeepMind, Uber and Amazon contributed, makes a plea for larger volumes of data to be opened up in order to help the technology advance. This is no coincidence: Google DeepMind, Uber and Amazon have all built their business models on the wealth of data they collect.

Establishing leadership credentials and attracting talent

attracting talent

With this reservation in mind, we should nevertheless point out that the world of AI makes considerable use of open source code and algorithms but currently relies to a lesser extent on data. So how can it be in the interests of a company to make public the results of its work when by so doing it runs the risk of giving its competitors a helping hand? First and foremost, publishing results boost the company’s leadership credentials in the market. Opening up TensorFlow has therefore enabled Google to morph into a fully-fledged artificial intelligence platform and to establish itself as a leader in the sector. The TensorFlow software has been widely adopted by engineers all over the world, and has become a standard among the machine learning community. It has also become the most popular software on Github, a popular international portal for software developers. This software is being used to create tools in many different sectors, from aerospace to bio-engineering. This leads us onto the second advantage: when a company opens up its algorithms, that basically enables it to farm out part of its work, allowing software developers worldwide to take its research forward and thus potentially improve its products. This further enables the company to identify promising talent and recruit these people into its ranks. “Companies that decide to open source their technology can then plug into the live wires in the open source code community and benefit from those people’s research to improve their software. They will also have access to a large number of people who are familiar with their tools, and whom they can recruit,” points out Zachary Chase Lipton. For instance, students working as temporary interns at Google can continue to programme code when their internship is over, so the company can immediately benefit from their work and perhaps later offer them a job once they have completed their studies.

Building ethical artificial intelligence

"Working towards open AI is the best way of avoiding mishaps."

Artificial intelligence is in fact a topic that scares and fascinates people in equal measure, so if companies working in this sector can manage to demonstrate transparency and work in a collaborative manner, this can only help to ensure a positive image among the general public and also with potential customers. Moreover, working towards open artificial intelligence is the best way of avoiding any excesses or mishaps. This aspect is becoming more important as algorithms come to play an ever-larger role in our lives. They are now being used by recruiters to select candidates for a vacant post, by the police to identify potential crime-risk zones, by banks to decide whether or not to grant a loan, and so on. It is therefore essential that these algorithms should be as neutral and impartial as possible and opening up data is one way to ensure this, argues Zack Chase Lipton. “One of the biggest advantages of opening up data to the general public is that it allows us to address the fairness issue. If models are trained using datasets which reflect people’s prejudices, the model trained to imitate those decisions could well reflect the same prejudices. Placing data in the public domain will allow researchers to identify these potential problems,” he underlines. Another fear that many people have when it comes to AI – some see it as a real issue while others view it as merely an imaginary risk – is that sometime in the future AI systems may perhaps become too powerful and run out of control. This is one of the reasons behind the establishment of OpenAI. Founder Elon Musk has on many occasions expressed the view that AI does entail risks for humanity. The electric car and battery development entrepreneur argues that one of the ways of guarding against this is to make AI as open as possible, so that it can be kept under proper control and will not fall into the hands of a minority. This is also one of the reasons for the Partnership on Artificial Intelligence, involving Apple, Google, Amazon, Microsoft and IBM. The choice of an open source approach to code is a means of ensuring that, in the long term, AI does not become too smart for its human originators, with potentially deadly consequences. And it is also a way of demonstrating to the general public that these companies care as much about the common good as they do about business profits.

By Guillaume Renouard