Exploratory learning algorithm

From supermemo.guru
(Redirected from Exploratory algorithm)
Jump to navigation Jump to search

This article by Dr Piotr Wozniak is part of SuperMemo Guru series on memory, learning, creativity, and problem solving.

Curiosity is priceless

Artificial intelligence researchers would pay a lot to get a good grasp of the mechanism of human curiosity: the learn drive. The brain came up with cheap and simple tricks to build intelligence via curiosity. Those tricks can be explained to a first grader. However, few people truly comprehend how curiosity functions. The entire school system is based on principles that disregard the drive for human intelligence. This indicates that the importance of choice and knowledge valuation is unappreciated.

Curiosity belongs to the most unappreciated values of the human brain

Curiosity drives learning

The algorithm, which the brain uses to explore the environment is based on curiosity. The algorithm can serve simple goals such as finding food. It has also made possible to build our civilization. The algorithm balances contingent goals (e.g. finding food) with divergent open-ended quest for value.

The curiosity-based exploratory algorithm looks as follow:

  • of all available environments choose those that provide best average learning experience (maximum learntropy)
  • of all available information channels choose those that maximize the gain in knowledge
  • of all available pieces of information, pick and explore those that are most interesting
  • in estimating the value of knowledge use concept valuation in the concept network of the brain (see: knowledge valuation network)

All the choices attempt to maximize gains over the long-term, and variants of multi-armed bandit strategies are involved. The exploratory algorithm rewards all actions and behaviors that maximize long-term inflow of valuable knowledge (see: knowledge valuation).

Learning is curiosity-based and occurs on the way towards goals

Knowledge quest

A rudimentary implementation of the exploratory algorithm we can find in worms. They turn to environments that maximize reward (e.g. food). However, human brain has perfected an entire system for maximizing knowledge. The quest for knowledge is one of the most powerful drives that determine human behavior. Human learn drive uses the entire concept network of the brain to determine which pieces of information provide highest value. The brain matches information to prior knowledge. When it can detect the unexpected, the surprising, the novelty, it rewards the actions that lead to learning. It drives motivation for further learning in the same field in the same context. Each piece of knowledge is added to the concept network incrementally (see: Jigsaw puzzle metaphor).

Figure: C. elegans has a nervous system made of only 302 neurons. However, this is enough to implement an exploratory algorithm that is reminiscent of human curiosity, creativity, and problem solving. When the worm finds a patch of food, it will explore it. However, on occasion it will take an unexpected dash in a random direction in search of new patches (bacteria). Similar algorithms can be found in other animals, however, human learn drive is far more complex. It is based on knowledge valuation and the exploratory breaks are reserved for period of learntropy dropping well below the expected value. Human creativity is also based on knowledge, while in the worm its only aspect is a random choice of a direction. For the worm, a new patch of bacteria is a problem solved, for a human it might be a new idea for terraforming Mars. Last but not least, the metaphoric tool for inducing learned helplessness (marked as "school") in primitive animals will rather only have the form of drive habituation. Nevertheless, the little worm may present a convincing illustration than the intelligent missile metaphor is far more universal and may be relevant to primitive nervous systems as well. For more on the universality of the learn drive see: The psychology and neuroscience of curiosity

Knowledge valuation

All concepts in a concept network have their valuations set. My grandmother may have a higher value in my brain than my friend's grandmother. I love books in biochemistry and cannot stand classic Polish literature. Those book concepts have different valuation in my brain. Each time a valuable concept is activated, we may experience a tiny dose of reward and a tiny increase in motivation to perform actions that enhance this activation.

When I think of biochemistry, other concepts in the field may get activated at random. I may think of amino acids as they were one of the first things I learned about in a biochemistry book five decades ago. This may generate an urge to review some of amino acid material in incremental reading, or perhaps watch some video lecture in the subject. I may curiously wonder, what role amino acids play in optimum diet. That urge to know is called the learn drive (or simply curiosity).

Each time we learn, new pieces of knowledge get matched with the prior knowledge and if novel associations are found, we experience pleasure, or enthusiasm, or even euphoria. It all depends on the valuations of the concepts connected in the newly formed association. If I learn about amino acids and figure out that they play an important role in time-restricted feeding, I instantly get a dose of enthusiastic thinking of how my eating habits influence my health (values are highly subjective in reference to self). The concepts activated are (1) AA and (2) TRF. The new knowledge brings a connection: AA+TRF => health! The brain loves it! I experience the pleasure of learning and I am highly motivated to learn more.

New knowledge that associates high-value concepts of prior knowledge into new value provides high reward in learning

Goals vs. passions

Setting goals in learning is usually inhibitory. Goals may emerge naturally in the learning process. If so, all sorts of knowledge leading to goals become the feeding ground for passions.

In terms of a concept network, goals are concepts that represent future desired states. Such state concepts are marked with unusually high value. At the start of the exploration, early in life, a goal may be as simple as reaching for a ball. Gradually, a concept network of possible future states crystalizes. Some goals receive higher valuations. Being a surgeon may become a goal. Those value goals become drivers of passions. Knowledge feeds into the value of goals and goals determine the value of knowledge in mutual feedback of top-down and bottom-up valuation signaling and consolidation.

It is a grave mistake to decide at the age of 10 to become a great lawyer (e.g. just because lawyers earn good money). It is a criminal mistake if the goal is imposed by a parent. However, if events of life generate an admiration for a profession, e.g. as a result of watching numerous crime dramas or legal thrillers, the emergent goal is likely to provide a healthy set of high knowledge valuations. School curriculum is a set of goals that efficiently ruin the joy of learning within nearly all studied areas (except those were natural passions outdistance the curriculum).

Open-endedness and divergence are necessary for human intelligence. However, these do establish emergent goals which then are value anchors for future passionate activities.

Goals are welcome in learning only if they are emergent

Improving AI

If AI could effectively use the learn drive, it could make a more educated quest for new knowledge. Instead of gobbling half the Internet, it may seek golden nuggets that would boost the quality of its modelling. Instead of sticking with political correctness and scientific "consensus", AI could truly step into innovation by creatively expanding on what humans know (see: creativity). Creativity is nothing else than a stochastic search that results in learning derived from random activations of the brain set in a creative mode. Musings bring new knowledge. This process is cheap. All it takes is a match between high value activations (rooted in prior knowledge), and new random activations that may bring new high-value associations. See: Creativity

Artificial intelligence would benefit from simple implementations of the learn drive and creativity

School disaster

Coercive learning at school is an anathema of intelligence. Instead of the brain making its own choices as to the maximization of learntropy, someone attempts to feed the brain with knowledge in a wrong form, at the wrong time, for a wrong activation status, and for an incorrect status of prior knowledge.

Instead of learning about optimum diet, which is my core excitement of the day, at school, I may need to memorize that The Mexican-American War started in 1846. Perhaps a crafty teacher might compare the US actions in 1846 to those of Putin today. That instantly raises the valuations. However, if the brain is passionately mulling over one's feeding habits, it is all knowledge that is associated with the field that gets most sticky. Passion-driven knowledge gets high valuations, it gets well cosolidated, and it provides a set of semantic bridges in the status of the concept network. Instead, school learning is usually horribly asemantic.

Boring learning at school results in inattention and loss of authority on the part of the teachers. This is a horrible waste of resources at taxpayer's cost. See: Lex Libertas

Compulsory schooling grew to be the opposite of what we expect of intelligent education

School myths

Teachers' lobby cultivates and propagates a set of myths that make it harder to follow the above prescription. The number of myths is too numerous to list (see: Neuromythology). However, some I can list for the purpose of labelling the falsehoods:

  • Myth: brain chooses what is convenient. Fact: brain chooses what is pleasantly productive
  • Myth: choosing pleasure leads to no good things. Fact: pleasure of learning is the best indicator of good learning (see: Pleasure of learning)
  • Myth: teacher is important. Fact: teacher is just one of the channels of information
  • Myth: the effectiveness of learning is limited by knowledge. Fact: the algorithm works equally well in a worm, in a monkey, in a child and in an Einstein
  • Myth: kids are too immature to predict future. Fact: they do not have to
  • Myth: kids are too immature to make their own choices. Fact: the exploratory algorithm does not change with age. It simply uses an increasing set of prior knowledge
  • Myth: the art of learning should be taught at school. Fact: meta-knowledge of learning improves while learning
  • Myth: children need to be shown interesting things. Otherwise, they might never discover. Fact: in 2024, nothing can hide from a curious child
  • Myth: kids without guidance will waste their lives. Fact: Unschooling invariably produces happy and productive individuals
  • Myth: without school, kids will end up as criminals. Fact: Unschooling invariably produces happy and productive individuals
  • Myth: teacher needs to show a child her talents. Fact: talents are rewarding and self-enhancing. No assistance needed
  • Myth: YouTube recommends the radical, the negative, and pseudoscience. Fact: YouTube is driven by curiosity. It maximizes productive learning
  • Myth: We have tried the algorithm of freedom in the Middle Ages. Fact: In 1800, school competed well with farms and factories. Today it cannot compete with the web
  • Myth: During summer vacation children stop learning. Fact: Free kids never stop learning. For many kids at school, vacation is the best time to follow their passion (not curriculum)

For more see: Child is always right

Further reading



For more texts on memory, learning, sleep, creativity, and problem solving, see Super Memory Guru