How much knowledge can human brain hold
This article by Dr Piotr Wozniak is part of SuperMemo Guru series on memory, learning, creativity, and problem solving.
Estimating total knowledge in the brain
I am finally closer to being able to say how much knowledge a human brain can hold. Scientists have counted the number of synapses in the human brain (above 100 trillion), and came up with staggering numbers on brain capacity. However, in real life, we see the brain as an unreliable and forgetful device. For an average student, to learn 100 new French words for an exam is an effort. To memorize 45,000 words of English in the Advanced English collection is a feat very few achieved for the cost in time. In other words, trillions of synapses seem to contradict our struggles with mere thousands of questions we need to know to function effectively on a daily basis.
SuperMemo estimates
30 years ago, using simulations of the learning process in SuperMemo, I predicted that I will not see a human memorize a million items. Today, even a million seems like a super-optimistically high number.
SuperMemo makes estimates easy. We can count individual questions and answers, and worry less how many bits of information get stored in the brain. The estimates are pretty inaccurate to evaluate the totality of knowledge. For example, I once computed that I am able to recognize around 30,000 human faces without even trying. Some knowledge flows in effortlessly. SuperMemo measures harder knowledge that we want to preserve due to its high value.
For 32 years now, I have been using SuperMemo to control what knowledge I remember with high recall. In those three decades, there were a couple of breakthroughs where I could sense a significant acceleration or an increase to the quality of learning. In particular, the arrival of incremental reading in 1999 and the employment of the priority queue in 2006 were such booster events. However, the total number of items in the collection is not a true reflection of knowledge. In incremental reading, the tolerance for knowledge overload increases, and the priority queue makes it possible to focus on top priority items, while neglecting lesser knowledge. In other words, we can perfectly know a smaller set of items, and know a huge set of items more superficially. The only good reflection of knowledge would be to consider the probability of recall for all individual items.
Adding up memory retrievability
With the new spaced repetition algorithm, Algorithm SM-17 (2015), we can finally take a repetition history of each item and estimate the chance of recall at any point in time with pretty good accuracy. The last tool missing in the box was a chance to take any point in time (a date), and run the estimate for all items in the entire collection (e.g. "How much did I know on Jan 1, 2000?"). The only problem with such estimates is that they are computationally expensive. Finally, in 2020, SuperMemo 18 makes it possible to chose a selected number of time points in the lifetime of any collection, and run total knowledge estimates. For any date, the total knowledge estimate is the sum of memory retrievability estimates for that given point in time.
Longitudinally largest data set
I safely hold the record for the longest continuous use of spaced repetition (over 32 years at the moment of writing). I will keep the title as long as I keep learning. I was simply the first to start (see: Birth of SuperMemo). I hope to keep going as long as health permits. This is why my own collection is the best source of data today to make lifelong learning estimates (Jan 1, 2020).
There was a minor problem on the way however. Full record of repetition history was kept only as of 1996 (due to costs and limitations on the available storage in the late 1980s). Seemingly, this might make the first 9 year of data unreliable. However, SuperMemo can provide a reasonable simulation of repetition histories for items with a known number of lapses, repetitions, and the last interval. Those simulations cut the 9 years to roughly 6-7 years, and are accurate enough to make an imperceptible difference to the final outcome unless a major incongruity hides in that early period. I bet my right hand this is not the case. I was easily able to learn at 10,000 items per year in the early years, but total retrievability estimates cut this number substantially.
I split my 32 years of learning into 60 equal intervals and let the computational procedure run overnight. In the morning, I had a beautiful graph that I want to share for the new year 2020. The staggering conclusion is that all my perceptions about acceleration, slow downs, improved learning, or neglect were largely illusory. For 32 years, I kept building up knowledge at a remarkably steady rate. In the early years, the effort seemed a bit harder, I needed more self-discipline, there were outstanding items, long days of repetitions, etc. In the times of incremental reading it all turn out to be fun, learning with pleasure, and learning on demand, i.e. learning mostly for a specific purpose when I needed new knowledge. My departure into unrelated areas of learning proceeded at leisurely pace. With all those seeming upheavals and changes, hardly any kink shows up in the graph. Many people misread that observation as a disappointment with SuperMemo. Metaphorically speaking, I am only surprised with the smooth ride of the rocket. The speed is no less fantastic.
Conceptualization process
To make a good estimate of maximum human knowledge, we need to have data that would span a 100 years period. This is how long an average human might live given a perfect lifestyle, good health, and a couple of other conditions. My 32 years of learning fall into the most uneventful period that is characterized by steady linear progress. In theory, the conceptualization process in the concept network of the brain should begin slowly. It might possibly saturate when the ceiling of the brain size is reached. Those early and late changes in the speed of conceptualization will be reflected in the underlying speed of learning. This is illustrated in the figure with the light blue line marked as computational Capacity:
Figure: Hypothetical course of learning and conceptualization in a fixed-size concept network. The naïve network begins the learning process at high plasticity (in red). As individual concepts form, they are consolidated and stabilized. The overall stability of the network keeps increasing (dark blue). The speed of conceptualization (in orange) is a resultant of plasticity and stability. It reaches its theoretical maximum somewhere on the way from the random graph stage to a sparse representation stage. This is the time of a large supply of concepts that may be subject to generalization, and a good balance between stabilization and forgetting. The overall problem solving capacity of the network (light blue) is negligible at first, and tends to saturate with network stabilization. Large number of well-stabilized concepts makes it harder to find new plastic network nodes for further conceptualization. The maximum capacity of the network depends on its size. Speed of learning in spaced repetition at older ages seems to indicate that the size of the concept network of the human brain is high enough to provide for lifelong learning without noticeable saturation. See: Conceptualization theory of childhood amnesia and How much knowledge can human brain hold
Combining data from many users
To make a good estimate I would need data from many users. In particular, the first 10 years of life, and the data past the age of 80 are most precious. We have surprisingly many users in those age categories, but all users learn at different speeds, use different strategies, learn different things, etc. This is why we need collections that span a decade or more. Short periods cannot be collated from many collections because they are largely uninformative. Regrettably, I have hundreds of collections submitted in various circumstances, and they mostly come from young or middle aged users. They also predominantly come from novices. To this day, I do not have any data from users above the age of 80.
I was hoping that my own data might hint if the fifth decade of life brings some slowdown to the learning process. However, my data has been polluted with a great deal of self-experimentation. To account for artifacts of experimental learning, I needed extra correction procedures. In the end, depending on the method, I received a different verdict. My learning might have slowed down a bit, or it might have actually accelerated (my unreliable perception leans towards the latter).
By combining data from collections of users at different ages I arrived at a hypothetical course of learning in a lifetime. Unfortunately, the projected slowdown after 70 is just a hypothesis. I tried to imagine myself in a wheelchair at 100, and the mere loss of mobility, I imagine, would undermine my ability to progress with learning at the present rate. I have no data to prove the slowdown. It is just a common sense speculation and my best guess based on four decades of analysis and deliberations on the nature of learning.
Figure: Projected course of lifelong knowledge acquisition in spaced repetition. The curve was compiled with the use of data from users of different ages. The only consistent change in the speed of learning seems to occur at earlier ages (conceptualization stage). The projected slowdown at older ages is hypothetical and may be associated with inevitable aging rather than with the limit on the size of the concept network. Moreover, the impact of aging may not necessarily be associated with cognitive aging
Matching the projection with my own data
A perfect match between my data and the projection is deceptive. The projection itself was largely rooted in my own data. It is the patches on the sides that needed more assistance from other user collections. The early age acceleration is well measured and intuitively obvious. It is rather the lack of correspondence between perceived accelerations and slowdowns that surprised me. The linearity of the process beyond the age of 20 is striking. Could this be the effect of the saturation of the conceptualization process? More learning implying more interference?
My own learning curve was inserted into the projection above and marked with red datapoints:
Figure: Projected course of lifelong knowledge acquisition in spaced repetition. The curve was compiled with the use of data from users of different ages. The middle course of the projected curve has been replaced with actual data from my own 32-years-long learning process (Piotr Wozniak, December 28, 2019). Instead of using the usual metric, i.e. the count of items, the curve uses the sum of retrievability estimates for the collection. All perceived accelerations in learning, e.g. caused by innovations in incremental reading, turned out largely illusory. This raises an exciting possibility that the overall speed of learning may be relatively constant as suggested by some properties of memory derived from the neurostatistical model of memory. The projected slowdown at older ages is hypothetical and may be associated with inevitable aging rather than with the limit on the size of the concept network
Steady rate of learning
Considering that our ability to process information is affected by homeostatic fatigue, and that our ability to consolidate memory seems limited by the biological properties of the network, it is plausible to believe that we might all learn at a relatively steady rate throughout our lives. Those who use spaced repetition can decide what to remember, but do not increase their overall amount of knowledge. Those who do not use spaced repetition might be learning at a similar rate, except they learn a great deal of trivia (e.g. names of movie stars, brands of food in the shop, or history of their favorite football team). Due to the fact that the neurostatistical model of memory states that the prime source of forgetting is interference, adding more items to SuperMemo may speed up forgetting. Those processes of learning and forgetting seem to balance out. I certainly had times of tangible slow down (e.g. at the age of 30, in the early days of SuperMemo World while being busy with business matters). I also had periods of massive expansion of my knowledge (e.g. when studying the science of sleep). Those periods do not show on the graph as different in comparison to the average. This is one more hint that we should avoid hard pressure on more hard learning. I claim that pleasure of learning is vital for efficient coherent learning and healthy conceptualization. When fatigue sets in, a well-schooled student may push for more learning, while it makes more sense to get some sleep, deposit the "next layer of knowledge" and come back refreshed on the following day (or after a siesta).
Total knowledge of the human brain
A good measure of the total knowledge in the brain will probably come from combining molecular and microscopic data. Once we know exactly what roles are played by individual types of synapses, we will be in a good position to toss a big number. However, it is far more interesting for an individual student to know how much he or she can remember in terms of questions stored in a knowledge collection designed to keep vital knowledge for life. The size of that ultimate knowledge seems pretty easy to estimate today. That size can be approximated by a linear projection of the speed of learning for the number of years of efficient learning with a healthy brain. As it is not recommended to start spaced repetition before the age of 10-12 (unless out of child's own will), we can ignore the early slow period of conceptualization (see: SuperMemo does not work for kids). We can only hypothesize on the degree of slowdown due to aging. The slow down may be cognitive (e.g. due to senile dementia, sleep disorders, etc.). It may also be the effect of aging unrelated to the brain (e.g. bad sight, poor mobility, struggles with health issues, etc.). There can also be some acceleration in the early period of learning due to a small size of the learning collection, and a gradual increase in learning skills (esp. formulation of knowledge).
Considering all the above factors, a student can estimate her own knowledge at retirement by (1) taking a SuperMemo collection that is at least 1-3 years old, (2) estimating Total knowledge and (3) multiplying it by the relevant number of years remaining till retirement. For most people, the speed of knowledge in free learning falls in the 500-4000 items/year bracket. This will probably set the upper limit of knowledge to 250-300 thousand items before the retirement age. Note that I did not add the usual consideration of time spent on learning per day. In free learning, that time will probably not exceed 2-4 hours of study, of which a great deal might be devoted to processing knowledge (as in incremental reading). I come to believe that adding time may not add to overall knowledge. This is verifiable. However, verification might stand in opposition to the fundamental law of learning. If you push harder or learn faster than estimated herein, please let me know. In SuperMemo 18, you can make all estimates on your own.
Many students will fail to stick to the lower boundary of the estimate (500 items/year). However, that failure does not ever seem to be related to an inherent limitation in the brain (among those who are capable of undertaking spaced repetition). There is a whole host of limits in the environment, in human emotion, in motivation, in the availability of time, etc. There is little or no correlation with IQ. The rule of the thumb is that a happy student that pursues learning with pleasure should easily fit in the estimated brackets. Lack of self-belief may be an actual physical limitation to one's own progress. I hope this text helps dispel some mythology instilled at school.
I am pretty sure that the awareness of the brain limit will make many a young man embark on an attempt to beat the number in a decade. I am curious how this will work out. Please let me know, and don't forget about the brain hygiene (see: Memory overload).