The 93rd Installment
Future Challenges and Direction of AI in the Deep Reinforcement Learning Development Process

by Hisashi Hayashi,
Associate Professor, Master Program of Innovation for Design and Engineering

Currently, we are said to be experiencing the third wave of artificial intelligence (AI). The technology at the heart of this wave is undoubtedly the neural network technology called deep learning. I would like to examine the future challenges and trends of AI based on the three deep learning technologies that have emerged so far.

The third AI wave was sparked by a deep learning technology called convolutional neural networks (CNNs). Using CNNs, Dr. Hinton's team from the University of Toronto won the ILSVRC-2012 image recognition competition. Deep learning has been in the spotlight since then, and the Microsoft-led team's AI surpassed human image recognition capabilities at ILSVRC-2015. However, this is image recognition technology, which is different from what the average person imagines AI to be. The general image of AI is that it has emotions like Doraemon, thinks autonomously about what to do and how to act, communicates naturally with humans, and acts collectively in cooperation with its peers. Even I am interested in such AI-like AI.

The next technology related to deep learning that has attracted a lot of attention is deep reinforcement learning called deep Q-networks (DQNs) presented in 2013 that combine deep learning and reinforcement learning. (It was also published in Nature in 2015.) In reinforcement learning, rules are learned for determining what actions to take from the current situation that will result in higher rewards in the future. This combination of technologies allowed AI to score more points than existing algorithms and pretty much fights neck and neck with human players in reflexive computer games such as Breakout on the Atari 2600. In fact, the combination of neural networks and reinforcement learning has been around for a long time, but by using CNNs for image recognition in part of the neural network, it is now possible to learn the next action to take (in the case of Breakout, whether to move the racket to the right or to the left, or to keep it where it is) based on pixel information and scores on the game screen.

Unlike basic CNNs, DQNs perform not only image recognition, but also judgment and actions depending on the situation, and is closer to the AI that is ordinarily imagined. However, I feel there are certain limitations and that something is not quite right. In this method, it is limited to only intuitively judging the next one move and acting on it reflexively, not systematically acting towards a distant future goal. The discomfort is that the input is an image, and it outputs the next action without abstracting or symbolizing the information. DQNs could be applied straightforwardly, for example, with the “image” of stock price graphs being used as input for DQNs to make stock trading decisions, or using the “screen” of traffic simulators as input for DQNs to switch signals at intersections to avoid traffic congestion. Nevertheless, the fact that the system relies only on intuitive judgments based on forced image information as input leaves a sense that it is still not quite right.

The last deep learning-related AI technology I would like to focus on is AlphaGo, which defeated the human Go champion in 2015. That algorithm was also published in Nature in 2016. What makes AlphaGo unique is that it adds deep reinforcement learning techniques to the conventional game tree search. In other words, it uses the results of deep reinforcement learning to determine which branches to explore in the game tree search. Therefore, it explores not only the next move, but also future moves.

The deep learning that AlphaGo uses also utilizes CNNs for image recognition, but I do not really feel there is anything strange regarding this. That is because it is a human-like sequence of thought: image recognition, intuitive judgment, planned thinking (game tree search), and then action. I think the reason I felt uncomfortable with the DQNs mentioned earlier is that the thinking part is just intuitive judgments rather than deliberate thinking, and the image recognition and intuitive judgments are integrated and not separated in the DQN neural network. On the other hand, in AlphaGo, image recognition and intuitive judgment are also integrated in the neural network, but planned thinking (game tree search) is separated from it.

AlphaGo continues to evolve, and according to a paper published in Nature in October 2017, AlphaGo Zero, which only uses reinforcement learning through self-play matches with an AI using no historical game data, now completely beats AlphaGo that also uses game data when learning. And according to a recent paper published in arXiv on December 5, 2017, a more generic version of AlphaGo Zero's algorithm, Alpha Zero, can even win against the world's strongest software in shogi and chess with only a few hours of self-play reinforcement learning.

If you look at the development of deep reinforcement learning, it started with real-world sensing technology such as image recognition in CNNs, then deep reinforcement learning of rules for reflexive behavior was created in DQNs, and finally came AlphaGo, which utilizes the results of deep reinforcement learning as heuristics for game tree searches. This is a bottom-up approach that gradually abstracts and symbolizes real-world sensing information to advance intelligence. Also, the most common feature among them is that they apply deep learning to neural networks where only “image recognition” and “intuitive judgment” are processed. Neural networks, reinforcement learning, game tree searches, or any combination of the three have been improved, but the basic ideas have been studied since long before deep learning.

Old-fashioned AI was a top-down approach centered on reasoning in an abstract, symbolic world. This is close to the reasoning of human beings, who think using the symbols of language. However, the gap between the symbolic world and the real world is huge, and AI had often been criticized as being of no use. In contrast, today's AI using deep learning is a bottom-up approach and is grounded in the real world through neural networks, but advanced reasoning in the abstracted and symbolic world is yet to come. In the future, the biggest challenge will be to successfully integrate the current bottom-up and the old-fashioned top-down approaches to AI technology. Furthermore, although the ultimate dream is to achieve the intelligence of a single robot like Doraemon, another important theme is to have countless other advanced AI technologies expected in the future work together in today's networked society.

The 93rd InstallmentFuture Challenges and Direction of AI in the Deep Reinforcement Learning Development Process

The 93rd Installment
Future Challenges and Direction of AI in the Deep Reinforcement Learning Development Process