- The texts generated by the dialogue system correspond to “common sense”.
- The system's responses match the context of the dialogue and the person's expectations.
- Understanding the goals, intentions of a person's statements in dialogue.
Understanding the meaning cannot be fully attributed to the topic of understanding the context of the dialogue, since the meaning of the interlocutor's statement can be interpreted in different ways, and it is not clear what interpretation the state of understanding should correspond to. Can “errors” in the opinion of the interlocutor (person) be interpreted as a different understanding of the meaning of expression by the system? To a greater extent, understanding the meaning refers to understanding the intentions and goals of the statement, and this is a separate topic in the theory of mind. “Common sense” as a criterion for understanding can be interpreted more precisely. In a general sense, this is the correspondence of the answer to the picture of the world, which is verifiable. And today this is the best criterion for understanding the context of the dialogue by artificial agents such as dialogue bots. But so far, bots have not shown success in this.
Analysis of approaches
Relevant answer is the simplest criterion for the bot to understand the interlocutor (person). But this criterion is easy to "fake", which has been demonstrated more than once by the participants of the Loebner Prize. It is achieved by placing a large number of variable response templates on the "intents" recognized by the neural network. It is difficult to call this understanding. But the success of such bots is also modest - they recognize mixed intents extremely poorly. One question outside of the templates and the system fails. It is easy to check this on bots like Alice from Yandex and Siri from Apple. We can say that the knowledge of the world for such systems is fragmentary.
Another way is the construction of ontologies that cover all knowledge about the world in order to have answers to questions. This is realized through the ability to identify a question on the ontology branch and build the correct answer based on the knowledge embedded in it. Ontologies claim to cover the whole picture of the world with ontology. This approach has been used by projects such as CYC and ABBYY Compreno. Now we can state that this method of solving the problem of "understanding" has failed. Decades of manual construction of such ontologies by development engineers did not allow them to build even one that would not fail even on simple children's questions. Problems of ontologies are mainly associated with the metaphoric nature of language and homonymy, which cannot be consistently laid down in a tree-like "logical" structure of ontologies. The same attempts include the FrameNet projects (Fillmore),consisting in the construction of scenarios based on a semantic frame to reflect the facts of the world. But for the same reasons, the project could not be developed.It can be said that such systems are not able to realize the parsimony of language as the ability to describe a large number of concepts through the mechanism of metaphor with a limited vocabulary.
Since 2010, more emphasis has been placed on neural networks, but they are limited in that they can make assumptions only within the framework of the picture of the world that was present in the training set. Networks such as GPT-3 (February 2019) in the OpenAI project, built on the Transformer architecture (a generative network trained on billions of words in terms of text), generate grammatically correct “human” texts that are completely meaningless, which indicates a complete misunderstanding by them " common sense ”as in the example below.
Example:
You: How many eyes has a wolf?
GPT-3: A wolf has four eyes.
You: Why four?
GPT-3: Two are for seeing in the dark with.
You: Why two?
GPT-3: The wolf needs to see if its food is ready.
You: And how many eyes has a human?
GPT-3: A human has two eyes and two ears to hear with.
You: And how many eyes has a blade of grass?
GPT-3: A blade of grass has one eye and one ear to see and hear with.
You: Why one?
GPT-3: To ensure it does not get stepped on.
We can say that such systems can generate a statement on any question, even not found in educational texts, but they cannot be guaranteed to build “common sense” facts corresponding to the picture of the world.
There are combined approaches like COMET, founded by Yejin Choi (1), in which a "handwritten" knowledge base of the facts of the Atomic world was used to retrain the pre-trained language model GPT-2. As a result, the network began to generate significantly more plausible facts that are absent in both Atomic and the GPT training set. But the success of such a project is also modest so far, since there is no guaranteed answer.
Of interest are DeepMind systems, which, in addition to a neural network, have an external memory of facts (or experience), which allows them to learn the "rules of the game" without a teacher, simply by being active in the environment and recording its result. And in this way to learn, even in playing with each other, which made it possible to beat human players even in games like Go. This is now considered the mainstream in building agents that "understand the world" of the game. But the architecture of such a self-learning system does not allow it to be scaled to some more complex reality than a game of black and white pebbles or a primitive computer game Atari. The way of teaching clearly has a technological limit of complexity.We can say that such systems create a "picture of the world" not by using knowledge to build new knowledge in order to save system resources. Therefore, they need too many resources to learn even in poor environments.
Summary
What, then, can be called “understanding” of artificial systems from a pragmatic point of view? The common answer is that the agent must have knowledge. At the same time, as experience shows, it is impossible to construct comprehensive knowledge. Another answer may be consistency in the system's responses. But as we can see, systems trained on huge texts do not differ in logicality in the statements they generate.
Understanding by an AI system means its ability to DELIVER plausible hypotheses about the picture of the world from fragmentary knowledge of facts from this world. And to save money, the system must be able to use a limited language to describe an infinite number of facts, which is achieved by mechanisms like metaphor. At the moment, however, this mechanism is not known enough to be embodied in the program code. The available concepts of metaphor are not algorithmically specific, such as conceptual metaphor or blends. Mathematics is not yet applicable to them, but the author's work is carried out in this direction.
According to the author, such completion is the main criterion for the ability of an artificial system to understand. When the “picture of the world” is limited, for example, in chess, we are able to explicitly lay down algorithms for the production of knowledge, that is, possible moves, so that chess can orient itself in any arrangement of pieces that has not even been encountered before. But how to do this in the real world, where there are many orders of magnitude more rules, it is not yet known what constitutes the main direction of the author's research.
Bibliography
1. Common Sense Comes Closer to Computers, Quantamagazin, April 30 2020