This post is a chapter from my book “How to Create Machine Superintelligence“
Artificial general intelligence is probably the holy grail of computer science. Despite tangible progress in machine learning in recent years, many computer scientists believe that we are still far away from creating really intelligence machines. They say that, probably, even human-level artificial general intelligence is still decades away. The main problem is that we have to incorporate machine learning systems with reasoning and planning. So, what can we do about that?
From my point of view, different approaches for building artificial general intelligence can be divided into two very broad meta-categories: engineering approaches and evolutionary approaches, though the line between these two meta-categories is fuzzy.
The engineering approaches can be divided further into two separate groups, depending on the number and diversity of algorithms they use in an attempt to build AGI. The first group favors architectures with very few similar algorithms or even one ultimate algorithm, and the second group refers to architectures with several or many diverse algorithms. A vivid example of the approaches favoring diversity of algorithms is the OpenCog project, founded by the American scientist and entrepreneur Ben Goertzel. In contrast to that, some experts are inclined to believe that a very sparse set of algorithms or computational methods, for example, only using deep neural networks, can be sufficient for building AGI.
It is well known that different areas of the neocortex perform different functions. For example, there is a visual area, auditory area, motor area, and others. It is also known that if certain areas of the cortex have been damaged or completely removed, other areas can to some extent replace the functions of the impaired areas. This leads to the conclusion that the principles of information processing in all regions of the cortex are most likely identical or very similar. This assumption favors the possibility of building AGI using very few algorithms.
Obviously, an intelligent machine has to receive information from the world. In general, the richer the sensory input, the better the understanding of the world the machine can produce. Here, I should say that machines can sense a far richer range of sensory inputs than we can. For example, machines can perceive X-rays, ultraviolet, and ultrasound. This information must be transmitted in a digital form if we work with digital computational devices.
Given a vast complexity of reality, an intelligent machine should probably have an ability to discard noise and other redundant information in order to process only necessary information. Otherwise, performing learning would be a much computationally expensive process. Of course, in order to get rid of redundant information, the machine has to determine in the first place whether the information is redundant or not.
As we have seen, nothing, including natural intelligence, can precisely simulate reality. Because of this, in order to comprehend the world, we need to generate abstract concepts and operate with them or think, in other words. The process of thinking or reasoning can be considered as adjusting our models of the world.
The idea that there is a hierarchical structure of patterns or models in our mind has been known probably since the discovery of the structure of the neocortex. Individual patterns, which can be words or ideas, represent simpler models from which more complex models can be constructed, and an overall model of the world in our mind consists of all those smaller models. At a physical level, this overall model is an entire neural network in the brain or at least in the cerebral cortex; whereas different areas of the cortex represent different parts of this biological neural network, which are responsible for forming smaller models like individual words or ideas.
But having only such a hierarchical structure is not enough. We have to build such a system which is flexible enough to generate different kinds of models of the reality, including numerous very specific models and also extremely abstract ones, depending on the situation, and these models should be interconnected with each other. In other words, in addition to a vertical hierarchical structure, there should be also an elaborate horizontal structure.
Now, it is widely accepted that logical reasoning is not an innate feature of our minds. In fact, we often tend to use analogies rather than logic when we are thinking. This is very similar to operations on vectors, which we mentioned discussing natural language processing in the previous chapter. However, using just standard word embedding techniques for building word vector representations cannot capture word meanings in a similar way to natural intelligence. And it’s not just because we cannot define synonyms using standard word embedding techniques. Due to the rapid progress in deep learning, it is likely that there will be a working method of defining synonyms in a very near future. The problem of capturing meaning in a similar way to natural intelligence more corresponds to the fact that when we comprehend the world naturally, we don’t derive the meaning of words only by the word context in which they appear. Instead, we use information coming from all our senses.
However, we need to use logic in order to produce more accurate models of reality. Probably, implicit logic rules, which we use when we think, can be learned by observing the world. And, probably, an unsupervised machine learning system can also learn logic this way.
Switching between different sub-models rather than constantly running an overall model of the world is, presumably, the preferable strategy because running an entire model would be a much more computationally expensive approach. There should be some trade-off between running an overall model of the world and smaller sub-models, and the intelligent machine should be able to find such an optimal trade-off. This process is somewhat similar to natural attention. Supposedly, we can use reinforcement learning to get these models to do something useful.
Defining a correct reward function is critically important for building beneficial or at least safe AGI. In simple terms, AGI should probably make its possessors happy or satisfied. But how happiness or satisfaction can be measured? Perhaps, these ephemeral psychological states can be measured by certain brain activity of the AGI owners.
The self-awareness of such an artificial intelligence system would be the understanding (sub-model) of its own place in its model of the world.
There are many mathematical techniques for modeling, and neural networks are just one of them. Other very interesting techniques for modeling, which are less known with respect to AI in this day and age, include Probabilistic programming and Cellular automata. But we shouldn’t completely ignore the possibility that at this stage we may not even fully have all mathematical apparatus needed for building AGI.
But let’s consider the hypothesis that AGI can be built based on deep learning (It may be terribly wrong of course). From my perspective, in terms of deep learning, our mental activity could be viewed mostly as generative models of our senses and internal representations that are constantly being fed into a reinforcement learning procedure, which apparently has a very complicated reward function. But what are and how do these internal representations form in the first place? The internal representations are just memories and maybe some congenital mental formations. The generative models are induced by memories, but the generative models produce new memories themselves.
It is widely accepted that we have some innate reactions generated by some patterns. For example, scenes with not natural looking or mutilated body parts can provoke horror or repulsion. Conversely, seeing other patterns – for example, such as of a sexual character – can induce unconditional temptation, attraction, and affection. The list of such congenitally predisposed reactions to certain patterns is probably humongous in humans. Moreover, the behavior of some animals and especially insects is largely predisposed or even entirely determined by their innate instincts, and yet these organisms can adapt to an ever-changing and often hostile environment. It is not entirely clear whether such innate reactions are an indispensable component of the highest forms of intelligent behavior. Even if so, we already have direct analogies to examples of innate predisposedness in robotics, where some types of robots don’t learn really from scratch. Some of such innate reactions can be also viewed as a reward function.
The output of the generative models could be also fed into other algorithms, and the input for the generative model could be taken not only from the machine’s sensors and associative memory that primarily induce the formation of the models but also from local or global resources of data stored in digital format. In this case, such a system would be capable of performing additional functions.
Imagine an AI system that could generalize as well or even better than we can, instantly access any information on the web or local databases, never get tired or bored, and perform a very big number of specific algorithms. Could such a system be called superintelligence? In my opinion – probably yeas.
Now, let’s switch to discussing pure evolutionary approaches to building AGI. The human brain, an amazing thinking device, is a product of biological evolution. The theory of evolution through natural selection is one of the most significant ideas in science ever. This theory was first comprehensively developed by Charles Darwin, whom I refer to as one of the greatest thinkers of all time, with respect to biology in his groundbreaking book “On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life” in the middle of the 19 century. Darwin’s theory turned biology upside down at the time. However, the theory of evolution by natural selection is relevant not only to biology.
Many natural and especially social processes can be viewed in evolutionary terms. Take for example economy. One reason why a free liberal economy is more efficient than a planned economy is that there are more intrinsic evolutionary mechanisms in a liberal economy. By evolutionary mechanisms in a liberal economy, I mean first of all the natural selection of the fittest enterprises.
Evolution theory, of course, has its place in computer science. In 1975, John Henry Holland published his seminal monograph “Adaptation in Natural and Artificial Systems”, where he laid out the foundations of evolutionary computations.
Evolutionary computation is a subcategory of machine learning. There are many diverse and very interesting strategies in the field of evolutionary computation. Probably, the most frequently used strategies include so-called genetic algorithms, which fall under the umbrella of genetic programming.
The three most important principles of evolution – both biological and computational – include heredity, variation, and selection. The general framework of genetic programming is relatively simple. First, we generate a number of solutions to a problem and evaluate for each of them its performance. Then, we create a new generation of the solutions by crossing good solutions with each other from the previous generation, from time to time introducing some random mutations to increase variation. And we repeat this cycle until we have a reasonably appropriate solution to the problem. This process can be very efficiently parallelized, which is very useful in practice. Usually, genetic algorithms find their use when the problem is very complex and there is no need to find the best possible solution to the problem. Genetic algorithms can be used in a humongous amount of domains.
One possible application of genetic algorithms in artificial intelligence is using them as an optimization method in deep neural networks instead of gradient-based approaches. In a broader sense, evolutionary computation is used, for optimizing and building more efficient ANNs that evolve in an evolutionary process.
But evolutionary computation can be potentially used even in a broader domain – for building true AGI. Supposedly, an AGI program can be devised as a product of an evolutionary process. However, there are some problems with this approach. First, since intelligence is a very complex phenomenon, we should correctly specify the fitness function for the evolutionary process. Moreover, this fitness function has to be aligned with our goals and values. Second, we should probably set up a proper structure for such an AGI program in the very beginning of the evolutionary process. Otherwise, the evolution may take unreasonably long time. Third, we would most likely not be able to use simulation because, due to the complexity of our program, simulated environments would be even much more complex and computationally intractable to simulate than the program itself. For this reason, we would most likely need to build a number of physical robotic agents, whose software would represent an evolving AGI program, and place these agents in a real environment. In this case, we again run into another problem: this evolutionary process can take much physical resources and time to evolve such a program. However, probably we don’t need a very complicated environment in simulation and very powerful AGI program to evolve. Maybe we would only need to get some crucial component for an AGI program in simulation and then augment this AGI with other functions or just scale it up for the real world.
In order to be able to perform various practical intelligent tasks, computer systems have to acquire an appropriate amount of common human knowledge. I don’t think this is a very hard problem especially if we had machines that could learn just by observing the world. Probably, we will soon see the appearance of various new artificial intelligence personal assistants collecting immense amounts of data, including the data about our everyday life.