In the long-term pursuit of artificial intelligence, IT specialists have designed and developed all kinds of complex mechanisms and technologies to create analogs of vision, language, thinking, motor skills and other abilities inherent in intelligent life. Although these efforts have led to the creation of systems of a weak form of artificial intelligence that can effectively solve certain problems in limited conditions and environments, such systems and hold a candle to the level of intelligence not only in humans, but also in animals.
In a new article published in the peer-reviewed scientific journal Artificial Intelligence, DeepMind scientists arguethat intelligence and its associated abilities will not emerge as a result of formulating and solving complex problems, but as a result of adhering to a simple but powerful principle: maximizing rewards. This is about reinforcement learning.
Reward is Enough, which is still in its preliminary screening at the time of this writing, builds on the evolution of natural intelligence, as well as learning from recent advances in artificial intelligence. The authors suggest that reward maximization and trial and error are sufficient to develop behaviors that exhibit intelligence-related abilities. From this they conclude that reinforcement learning, a branch of AI based on maximizing rewards, can lead to the development of a strong form of artificial intelligence.
Two paths for AI
One of the common methods for creating AI is trying to reproduce elements of intelligent behavior in computer systems. For example, the study of the mammalian vision system has led to the emergence of different types of artificial intelligence systems that can classify images, determine the location of objects in photographs, determine the boundaries between objects, and much more. Likewise, our understanding of language has helped in the development of various natural language processing systems such as answering questions with speech recognition, text generation, and machine translation.
But these are all examples of highly specialized systems that have been designed to perform specific tasks. Some scientists believe that the combination of several specialized AI models will lead to smarter systems. For example, it can be a software package that coordinates the work of individual modules of computer vision, voice processing, NLP and motor control to solve complex problems that require many skills. This is the first approach.
The second, which is proposed by representatives of DeepMind, is to recreate the simple but effective mechanism that led to the emergence of the mind. Namely, reinforcement learning. This method, over billions of years of natural selection and random change, allowed certain forms of life to evolve. Others who did not cope with the task - sooner or later left the arena of life. The most powerful were the creatures best suited for solving various tasks.
This is because, according to researchers, environmental conditions are so difficult that complex abilities must be developed in order to achieve certain results for creatures that live in this environment. Those. a simple reinforcement learning model promotes not so much a "spherical mind in a vacuum" as the emergence of a complex mind that is capable of solving complex problems.
An example is a squirrel. A hungry squirrel is looking for food, which makes sense. But a squirrel, which is only capable of finding food and instantly eating it, will not survive wintering. In the cold season, she will die of hunger. But a squirrel that has learned to hide nuts in a certain place, a pantry, and moreover, can remember this place - most likely, it will survive.
Developing abilities by maximizing rewards
In their article, the researchers provide several examples of how "intelligence and related abilities are implicitly generated to maximize one of the many possible reward signals."
Thus, various kinds of sensory skills help animals survive in difficult environments / conditions. Object recognition enables animals to detect food, relatives, react to threats, and avoid traps. Image segmentation helps them distinguish between complex objects and avoid deadly mistakes like falling off a branch. Hearing, not sight, saves the animal when nothing is visible or poorly seen. Likewise, the ability to taste and smell increases the animal's chances of survival.
The article also discusses reward-based learning opportunities such as language fundamentals, social intelligence, imitation, and general intelligence, which is described as "maximizing a single reward in a single complex environment."
According to scientists, this entire evolutionary path with the advent of intelligence can be repeated for AI.
Reinforcement learning to maximize rewards
Reinforcement learning is a special kind of AI algorithms that includes three main elements: environment, agents, and rewards.
In the situation considered by scientists, the performance of actions by an agent leads to a change in both the state of the environment and the state of the agent itself. Depending on the direction of the changes - they help to achieve the goal or hinder, the agent is either rewarded or punished.
In many reinforcement learning experiments, the agent has no knowledge of the environment, so the agent starts from scratch and is random. And then, using the experience gained, the agent adapts, adjusting his actions and developing methods that lead to maximizing rewards.
In their article, DeepMind researchers propose reinforcement learning as the main algorithm that makes it possible to reproduce the maximization of rewards observed in nature, and ultimately lead to the creation of full-fledged AI. However, experts do not guarantee the appearance of a strong form of AI when using reinforcement learning. It may take the agent hundreds of years to do this, and in the end we will only get an AI capable of playing computer games. Until now, scientists have not developed reinforcement learning methods that make it possible to combine and generalize the knowledge gained in different areas.
Researchers acknowledge that learning mechanisms to maximize rewards are an unsolved problem that warrants careful further study.
Strengths and Weaknesses of Reward Maximization
A number of experts disagree with the provisions of the article. For example, data scientist Herbert Roitblat argues that simple learning mechanisms, trial and error are not enough for the emergence of intelligence. According to the scientist, the assumptions from the article face a number of problems when it comes to verifying them in real life.
In general, Roitblat believes that it takes too long for intelligence to emerge under the influence of these mechanisms. Of course, sooner or later, an endless number of monkeys typing on countless typewriters will type the text of the Iliad. But for the emergence of a strong form of AI in the near foreseeable future, this is clearly not enough.
“Once the model and its internal representation are in place, optimization or enhancement can guide its development, but that does not mean that reinforcement is sufficient,” he comments on the work of colleagues. He also criticizes the fact that the described work does not make assumptions about how exactly reward, actions and other elements of reinforcement learning are determined.
Overall, according to critics of the work, reinforcement learning, even if it is reward-maximizing, is good. But for the emergence of intelligence, natural or artificial, this is not enough, most likely there must be other factors that affect a living organism or a computer system in conjunction with learning.