SuperMemo Algorithm: 30-year-long labor

From supermemo.guru
Jump to navigation Jump to search

This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)

Why a simple idea could not materialize?

A perfect mathematical description of long-term memory is just about a corner. It may seem amazing today, but despite its simplicity, it took three long decades of starts and stops before a good model could emerge. In human endeavor, science is often a side effect of human curiosity while other burning projects often receive priority. The problem with science and invention is that they are blind and unpredictable. The uncovered truths show their power and value years after the discovery. A moral of the story is that all governments and companies should spare no resources for good science. Science is a bit like SuperMemo, today it seems like the reward is little, but in the long-term, the benefits can be stunning.

Today we can almost perfectly describe memory with the toolset employed in Algorithm SM-17. The only limit on further progress in understanding memory is the imagination, availability of time, and the ability to pose the right questions. We have all the tools, and we have a lot of data. We even have a nice portion of data combined with sleep logs that can now add a new dimension to the model: homeostatic readiness for learning, homeostatic fatigue, and even the circadian factor.

SuperMemo story shows that if you have an idea, you should put it to life (unless you have another better idea). The problem with ideas is that they may often seem a fraction as attractive as they are. I plotted my first forgetting curve in 1984, I forgot about it within a few months and recalled the fact 34 years later at the time when my whole life revolves around forgetting curves. Imagine the surprise! When I came up with the first spaced repetition algorithm, it took me over 2 years before I decided to recruit the first user. Without Tomasz Kuehn, SuperMemo for Windows might have arrived 2-3 years later. Without Janusz Murakowski, vital big data: repetition history record in SuperMemo might have been delayed by 1-2 years. When incremental reading came to life in 2000, only I knew it was a monumental thing. However, it took me quite a while to appreciate the extent of that fact. Today, I know neural creativity is a breakthrough tool, but I am still using it half-heartedly and not as often as its simpler alternative: subset review.

1990: First hint

Algorithm SM-17 was in the making for nearly a quarter of a century. While preparing materials for this article, in my archive, I found a picture of a matrix named "new strength" with rows marked as "strength" and columns marked as "durability". These were the original names for stability and retrievability used in the years 1988-1990. Like an old fossil, the paper tells me that the idea of Algorithm SM-17 must have been born around 1990.

A "new strength" matrix with rows marked as "strength" (stability) and columns marked as "durability" (retrievability)
A "new strength" matrix with rows marked as "strength" (stability) and columns marked as "durability" (retrievability)

Figure: A picture of a matrix named "new strength" with rows marked as "strength" and columns marked as "durability". These were the original names for stability and retrievability used in the years 1988-1990. The paper suggests that the idea of Algorithm SM-17 must have been born around 1990.

From the very early beginnings of the two component model of memory, I wanted to build an algorithm around it. My motivation was always half-hearted. SuperMemo worked well enough to make this just a neat theoretical exercise. However, today I see that the algorithm provides data for a model that can answer many questions about memory. Some of those questions have actually never been asked (e.g. about subcomponents of stability). This is also similar to SuperMemo itself. It has always struggled because its value is hard to appreciate in theory. Practical effects are what changes the mind of good students most easily.

1993: Distraction

In 1993, my own thinking was an inhibitor of progress. Further explorations of the algorithm were secondary. They would not benefit the user that much. Memory modelling was pure blue sky research. See: Futility of fine-tuning the algorithm. At that time, it was Murakowski who would push hardest for progress. He kept complaining that "SuperMemo keeps leaking Nobel Prize value data". He screamed at me with words verging on abuse: "implement repetition histories now!". It was simply a battle of priorities. We had a new Windows version of SuperMemo, the arrival of audio data, the arrival of CD-ROM technology, the arrival of solid competition, incl. Young Digital Poland who beat us to the title of the first software on CD-ROM in Poland by a month or so. We still cherish the claim to the first Windows CD-ROM in Poland. First SuperMemo 7 CD-ROM title was actually still produced in the US, but the contents were made entirely in Poland. It was naturally hosted by 100% pure Polish SuperMemo.

1996: Venture capital

In February 1996, all obstacles have been cleared, and SuperMemo finally started collecting full repetition history data (at that time, it was still added only as an option to prevent clogging lesser systems little unused data). My own repetition history data now reaches largely back to 1992-1993. This is possible due to the fact that all items in February 1996 still had their last repetition easily derived from the then current interval. I even have quite a number of histories dating back to 1987. In my OCD for data, I recorded the progress of some specific items manually, and was later able to complete repetition histories by manual editing. My own data is now, therefore, the longest spanning repetition history data in spaced repetition in existence. 30 years of data with a massive coverage for 22-25 years of learning. This is a goldmine.

On Sep 29, 1996, Sunday, in the evening, I devoted two hours to sketching up the new algorithm based on the two component model of memory. It all seemed very simple, and requiring little work. SuperMemo has just started collecting repetition histories, so I should have plenty of data at hand. Our focus switched from multimedia courses, like Cross Country, to easier projects, like Advanced English. It was a good moment, it seemed. Unfortunately, the next day, we got a call from Antek Szepieniec who talked to investors in America with a dream to make SuperMemo World the first Polish company at NASDAQ. He excitedly prophesied his belief that there is a good chance for an injection of a few million dollars into our efforts from venture capital. That instantly tossed me into new roles and new jobs. Of bad things, this has delayed Algorithm SM-17 by two decades. Of good things, the concept of Hypermedia SuperMemo, aka Knowledge Machine, aka incremental reading has gained a great deal of momentum in terms of theory and design. Practice trumped science again.

2005: Theoretical approach

In 2000, with incremental reading, and then 2006 with priority queue, the need for the delay of repetitions, and the need for early review increased dramatically. This called for huge departures from the optimum spacing. The old Algorithm SM-8 could not cope with that effectively. The function of optimum intervals had to be expanded into the dimension of time (i.e. retrievability). We needed a stability increase function.

One of the very interesting dynamics of progress in science is that the dendritic explorations of reality often require a critical brain mass to push a new idea through. In 2005, Biedalak and others were largely out of the loop busy with promoting SuperMemo as a business. I was on the way to a major breakthrough in incremental reading: handling overload. With the emergence of Wikipedia, it suddenly appeared that importing tons of knowledge requires little effort, but low-priority knowledge can easily overwhelm high-priority knowledge by sheer volume. Thus richness undermines the quality of knowledge. My solution to the problem was to employ the priority queue. It was to be implemented only in 2006. In the meantime, Gorzelanczyk and Murakowski were busy with their own science projects.

Gorzelanczyk used to attend a cybernetics conference in Cracow powered by my early inspiration: Prof. Ryszard Tadeusiewicz. For his presentation in 2005, Gorzelanczyk suggested we update our memory model. With the deluge of new data in molecular biology, a decade since the last formulation could make a world of difference. I thought that my ideas for finding the formula for building memory stability would make a good complement. This initial spark soon gained momentum in exchanges with Murakowski. Without those three brains (i.e. Wozniak, Gorzelanczyk, Murakowski), working in concert, and whipping up the excitement, the next obvious step would not have been made. Using the tools first employed in the model of intermittent learning in 1990, I decided to find out the function for stability increase. Once my computer started churning data, interesting titbits of information kept flowing serially. The job was to take just a few evenings. In the end, it took half the winter.

2013: Big picture re-awakening

Like in 2005, in 2013, the critical brain mass had to build up to push the new solutions through. However, I must give most credit to Biedalak. It was he who tipped the scales. With a never-ending battle for the recognition of SuperMemo's leadership and pioneering claims, he demanded we go on with the project and sent me for a short creative vacation to complete it. It was to be just one winter project, and it turned out to be two years, and it still consumes a lot of my time.

On Nov 09, 2014, we took a 26 km walk to discuss the new algorithm. Walktalking is our best form of brainstorming that always brings great fruits. Next day, we met in a swimming pool joined with Leszek Lewoc, worshiper of big data, who always has a great deal of fantastic ideas (I first met Lewoc in 1996, and his wife was probably writing a thesis about language learning, incl. SuperMemo, as early as in 1992). Simple conclusions from that brainstorming time were to use the two component model of memory to simplify the approach to the algorithm, simplify the terminology, and make it more human-friendly (no more A-Factors, U-Factors, R-Factors, etc.).

Increase in memory stability with rehearsal

To understand Algorithm SM-17, it is helpful to understand the calculations used to figure out the formula for the increase in memory stability. In 2005, our goal was to find the function of stability increase for any valid level of R and S: SInc=f(R,S). The goals and tools were pretty similar to those used in the quest to build the model of intermittent learning (1990).

Archive warning: Why use literal archives?
Until 2005, we were not able to formulate a universal formula that would link a repetition with an increase in memory stability. Repetition spacing algorithms were based on a general understanding of how stability increases when so-called optimum inter-repetition intervals are used (defined as intervals that produce a known recall rate that usually exceeds 90%). The term optimum interval is used for interval's applicability in learning. The said repetition spacing algorithms also allow for determining an accurate stability increase function for optimum intervals in a matrix form. However, little has been known about the increase in stability for low retrievability levels (i.e. when intervals are not optimum). With data collected with the help of SuperMemo, we can now attempt to fill in this gap. Although SuperMemo has been designed to apply the optimum intervals in learning, in real-life situations, users are often forced to delay repetitions for various reasons (such as holiday, illness, etc.). This provides for a substantial dose of repetitions with lower retrievability in nearly every body of learning material. In addition, in 2002, SuperMemo introduced the concept of a mid-interval repetition that makes it possible to shorten inter-repetition intervals. Although the proportion of mid-interval repetitions in any body of data is very small, for sufficiently large data samples, the number of repetition cases with very low and very high retrievability should make it possible to generalize the finding on the increase in memory stability from the retrievability of 0.9 to the full retrievability range.

To optimally build memory stability through learning, we need to know the function of optimum intervals, or, alternatively, the function of stability increase (SInc). These functions take three arguments: memory stability (S), memory retrievability (R) and difficulty of knowledge (D). Traditionally, SuperMemo has always focused on the dimensions S and D, as keeping retrievability high is the chief criterion of the optimization procedure used in computing inter-repetition intervals. The focus on S and D was dictated by practical applications of the stability increase function. In the presented article, we focus on S and R as we attempt to eliminate the D dimension by analyzing "pure knowledge", i.e. non-composite memory traces that characterize knowledge that is easy to learn. Eliminating the D dimension makes our theoretical divagations easier, and the conclusions can later be extended to composite memory traces and knowledge considered difficult to learn. In other words, as we move from practice to theory, we shift our interest from the (S,D) pair to the (S,R) pair. In line with this reasoning, all data sets investigated have been filtered for item difficulty. At the same time, we looked for possibly largest sets in which representation of items with low retrievability would be large enough as a result of delays in rehearsal (in violation of the optimum spacing of repetitions).

We have developed a two-step procedure that was used to propose a symbolic formula for the increase in stability for different retrievability levels in data sets characterized by low and uniform difficulty (so-called well-formulated knowledge data sets that are easy to retain in memory). Well-formulated and uniform learning material makes it easy to distill a pure process of long-term memory consolidation through rehearsal. As discussed elsewhere in this article, ill-formulated knowledge results in superposition of independent consolidation processes and is unsuitable for the presented analysis.

Two-step computation

In SuperMemo 17, it is possible to run through the full record of repetition history to collect stability increase data. This makes it possible to plot a graphic representation of the SInc[] matrix. That matrix may then be used in an effort to find a symbolic approximation of the function of stability increase. The same reasoning was used in 2005. The procedure was much simpler though. This can then be used to better understand Algorithm SM-17:

Archive warning: Why use literal archives?
The two-step procedure for determining the function of the increase in memory stability SInc:
  • Step 1: Using a matrix representation of SInc and an iterative procedure to minimize the deviation Dev between the grades in a real learning process (data) and the grades predicted by SInc. Dev is defined as a sum of R-Pass over a sequence of repetitions of a given piece of knowledge, where R is retrievability and Pass is 1 for passing grades and 0 for failing grades
  • Step 2: Using a hill-climbing algorithm to solve a least-square problem to evaluate symbolic candidates for SInc that provide the best fit to the matrix SInc derived in Step 1

Computing stability increase

The matrix of stability increase (SInc[]) was computed in Step 1. In 2005, we could take any initial hypothetical plausible value of SInc. Today, as we know the approximate nature of the function, we can speed up the process and make it non-iterative (see Algorithm SM-17).

Archive warning: Why use literal archives?
Let us define a procedure for computing stability of memory for a given rehearsal pattern. This procedure can be used to compute stability on the basis of known grades scored in learning (practical variant) and to compute stability on the basis of repetition timing only (theoretical variant). The only difference between the two is that the practical variant allows of the correction of stability as a result of stochastic forgetting reflected by failing grades.

In the following passages we will use the following notation:

  • S(t) - memory stability at time t
  • S[r] - memory stability after the rth repetition (e.g. with S[1] standing for memory stability after learning a new piece of knowledge)
  • R(S,t) - memory retrievability for stability S and time t (we know that R=exp-k*t/S and that k=ln(10/9))
  • SInc(R,S) - increase in stability as a result of a rehearsal for retrievability R and stability S such that SInc(R(S,t),S(t))=S(t)/S(t')=S[r]/S[r-1] (where: t' and t stand for the time of rehearsal as taken before and after memory consolidation with t-t' being indistinguishable from zero)

Our goal is to find the function of stability increase for any valid level of R and S: SInc=f(R,S).

If we take any plausible initial value of SInc(R,S), and use S[1]=S1, where S1 is the stability derived from the memory decay function after the first-contact review (for optimum inter-repetition interval), then for each repetition history we can compute S using the following iteration:

r:=1;
S[r]:=S1
repeat
  t:=Interval[r]; // where: Interval[r] is taken from a learning process (practical variant) or from the investigated review pattern (theoretical variant)
  Pass:=(Grade[r]>=3); // where: Grade[r] is the grade after the r-th interval (practical variant) or 4 (theoretical variant)
  R:=Ret(S[r],t);
  if Pass then
     S[r+1]:=S[r]*SInc[R,S[r]]
     r:=r+1;
 else begin
    r:=1;
    S[r]:=S1;
    end;
 until (r is the last repetition)
In Algorithm SM-8, we can use the first-interval graph to determine S1, which is progressively shorter after each failing grade.

We start the iterative process with a hypothetical initial value of matrix SInc[R,S], e.g. with all entries arbitrarily set to E-Factor as in Algorithm SM-2.

We can then keep using the above procedure on the existing repetition history data to compute a new value of SInc[R,S] that provides a lesser deviation from grades scored in the actual learning process (we use differences R-Pass for the purpose).

Incremental improvements are possible if we observe that:

Archive warning: Why use literal archives?
  • if Pass=true and S[r]<Interval[r] then SInc[R,S[r-1]] entry is underestimated (and can be corrected towards Interval[r]/S[r]*SInc[R,S[r-1]])
  • if Pass=false and S[r]>Interval[r] then SInc[R,S[r-1]] entry is overestimated

We can iterate over SInc[] to bring its value closer and closer to the alignment with grades scored in the learning process.

This approach makes it possible to arrive at the same final SInc[R,S] independent of the original value of SInc[R,S] set at initialization

In Algorithm SM-17, instead of the above bang-bang incremental approach, we use actual forgetting curves to provide a better estimate of retrievability, which can then be used to correct the estimated stability. The ultimate stability estimate combines the theoretical prediction of retrievability, actual recall taken from forgetting curves (weighted for the availability of data), and the actual grade combined with the interval as in the above reasoning. By combining those three sources of information, Algorithm SM-17 can provide stability/interval estimates without the need to iterate over the SInc[] matrix over and over again.

Symbolic formula for stability increase

After many iterations, we obtain a value of SInc that minimizes the error. The procedure is convergent. With the matrix of stability increase available, we can look for a symbolic formula expressing the increase in stability.

Dependence of stability increase on S

Predictably, SInc decreases with an increase in S. This phenomenon, named stabilization decay can now be inspected in SuperMemo for Windows. Here are the original findings from 2005:

Archive warning: Why use literal archives?
In Step 2, we will use the SInc[R,S] matrix obtained here to obtain a symbolic formula for SInc.

Step 2 - Finding SInc as a symbolic formula

We can now use any gradient descent algorithm to evaluate symbolic candidates for SInc that provide the best fit to the matrix SInc derived above.

When inspecting the SInc matrix, we immediately see that SInc as a function of S for constant R is excellently described with a negative power function as in the exemplary data set below:

SInc as a function of S for constant R is excellently described with a negative power function
SInc as a function of S for constant R is excellently described with a negative power function

Which is even more clear in the log-log version of the same graph:

The conclusion on the power dependence of SInc on S above confirms the previous findings. In particular, the decline of R-Factors along repetition categories in SuperMemo has always been best approximated with a power function

Dependence of stability increase on R

As predicted by the spacing effect, SInc is greater for lower levels of R. Note, however, that the procedure used in 2005 might have introduced an artifact: the survival of a memory trace over time would linearly contribute to the new stability estimate. This is problematic due to the stochastic nature of forgetting. Longer survival of memories may then be a matter of chance. In Algorithm SM-17, more evidence is used to estimate stability, and the survival interval is weighed up with all other pieces of evidence.

Archive warning: Why use literal archives?
When we look for the function reflecting the relationship of SInc and R for constant S, we see more noise in data due to the fact that SuperMemo provides far fewer hits at low R (its algorithm usually attempts to achieve R>0.9). Nevertheless, upon inspecting multiple data sets we have concluded that, somewhat surprisingly, SInc increases exponentially when R decreases (see later to show how this increase results in a nearly linear relationship between SInc and time). The magnitude of that increase is higher than expected, and should provide further evidence of the power of the spacing effect. That conclusion should have a major impact on learning strategies.

Here is an exemplary data set of SInc as a function of R for constant S. We can see that SInc=f(R) can be quite well approximated with a negative exponential function:

SInc as a function of R for constant S can be quite well approximated with a negative exponential function
SInc as a function of R for constant S can be quite well approximated with a negative exponential function

And the semi-log version of the same graph with a linear approximation trendline intercept set at 1:

Interestingly, stability increase for retrievability of 100% may be less than 1. Some molecular research indicates increased lability of memories at review time. This is one more piece of evidence that repetitive cramming can hurt you not only by costing you extra time.

Dependence of stability increase on retrievability (2018)

Despite all the algorithmic differences and artifacts, the dependence of stability increase on retrievability for well-formulated knowledge is almost identical with that derived from data produced 13 years later by Algorithm SM-17.

Recall that in SuperMemo, we use forgetting curves to provide a better estimate of retrievability. This is then used to correct the estimated stability. By combining several sources of information, Algorithm SM-17 can provide more accurate stability estimates. There is still the old artifact of the survival of a memory trace that would linearly contribute to the new stability. This artifact can be weighed out parametrically. However, each time SuperMemo tries to do that, its performance metrics drop.

Despite all the improvements, and much larger data sets (esp. for low R), the dependence of stability increase on retrievability for easy items seems set in stone.

This perfect picture collapses when we add difficult knowledge into the mix. This is partly due to reducing the long survival artifact mentioned above. For that reason, new SuperMemos do not rely on this seemingly well-confirmed memory formula:

Stability increase for easy knowledge at different retrievability levels
Stability increase for easy knowledge at different retrievability levels

Figure: The strength of long-term memory depends on the timing of review. For well-formulated knowledge, long delays in review produce large increase in memory stability. Optimum review should balance that increase with the probability of forgetting. In the presented graph, the relationship between stability increase and the logarithm of retrievability (log(R)) is linear. Log(R) expresses time. Nearly 27,000 repetitions have been used to plot this graph. Observed memory stability before review spanned from 2 days to 110 days. Maximum increase in stability of nearly 10-fold was observed for lowest levels of retrievability. The stability increase matrix was generated with Algorithm SM-17 in SuperMemo 17

Memory stability increase formula

With the matrix of stability increase at hand, we could look for a symbolic expression of stability increase. The equation found in 2005 will later be referred to as Eqn. SInc2005. Note that formulas used in Algorithm SM-17 differ:

Archive warning: Why use literal archives?
For constant knowledge difficulty, we applied two-dimensional surface-fitting to obtain the symbolic formula for SInc. We have used a modified Levenberg-Marquardt algorithm with a number of possible symbolic function candidates that might accurately describe SInc as a function of S and R. The algorithm has been enhanced with a persistent random-restart loop to ensure that the global maxima be found. We have obtained the best results with the following formula (Eqn. SInc2005):

SInc=aS-b*ecR+d

where:

  • SInc - increase in memory stability as a result of a successful repetition (quotient of stability S before and after the repetition)
  • R - retrievability of memory at the moment of repetition expressed as the probability of recall in percent
  • S - stability of memory before the repetition expressed as an interval generating R=0.9
  • a, b, c, d - parameters that may differ slightly for different data sets
  • e - base of the natural logarithm

The parameters a, b, c, d would vary slightly for different data sets, and this might reflect user-knowledge interaction variability (i.e. different sets of learning material presented to different users may result in a different distribution of difficulty as well as with different grading criteria that may all affect the ultimate measurement).

For illustration, an average value of a, b, c, d taken from several data sets has been found to be: a=76, b=0.023, c=-0.031, d:=-2, with c varying little from set to set, and with a and d showing relatively higher variance. See the example: How to use the formula for computing memory stability?

Conclusions derived from stability increase formula

The above formula for stability increase differs slightly from later findings. For example, it seems to underestimate the decline in stability increase with S (low b). However, it can be used to derive a great deal of interesting conclusions.

Linear increase in value of review over time

Due to the spacing effect the potential for an increase in memory stability keeps growing over time in nearly linear fashion:

Archive warning: Why use literal archives?
The above formula produced SInc values that differed on average by 15% from those obtained from data in the form of the SInc matrix on homogeneous data sets (i.e. repetition history sets selected for: a single student, single type of knowledge, low difficulty, and a small range of A-Factors).

As inter-repetition interval increases, despite double exponentiation over time, SInc increases along a nearly-linear sigmoid curve (both negative exponentiation operations canceling each other):

The graph of changes of SInc in time. This graph was generated for S=240 using Eqn. SInc2005

Figure: The graph of changes of SInc in time. This graph was generated for S=240 using Eqn. SInc2005.

The nearly linear dependence of SInc on time is reflected in SuperMemo by computing the new optimum interval by multiplying O-Factor by the actually used inter-repetition interval, not by the previously computed optimum interval (in SuperMemo, O-Factors are entries of a two-dimensional matrix OF[S,D] that represent SInc for R=0.9).

Expected increase in memory stability

Optimization of learning may use various criteria. We may optimize for a specific recall level or for maximization of the increase in memory stability. In both cases, it is helpful to understand the expected level of stability increase.

Let's define the expected value of the increase in memory stability as:

E(SInc)=SInc*R

where:

  • R - retrievability
  • SInc - increase in stability
  • E(SInc) - expected probabilistic increase in stability (i.e. the increase defined by SInc and diminished by the possibility forgetting)

The formula for stability increase derived in 2005 produced a major surprise. We used to claim that the best speed of learning can be achieved with the forgetting index of 30-40%. Eqn SInc2005 seemed to indicate that very low retention can bring pretty good memory effects. Due to the scarcity of low-R data back in 2005, those conclusions need to be taken with caution:

Archive warning: Why use literal archives?
From Eqn SInc2005 we have E(SInc)=(aS-b*ecR+d)*R. By finding the derivative dESInc/dR, and equating it with zero, we can find retrievability that maximizes the expected increase in stability for various levels of stability:

Expected increase in memory stability E(SInc) as a function of retrievability R for stability S

Figure: Consolidation curve: Expected increase in memory stability E(SInc) as a function of retrievability R for stability S derived from Eqn (SInc2005). Using the terminology known to users of SuperMemo, the maximum expected increase in memory stability for short intervals occurs for the forgetting index equal to 60%! This also means that the maximum forgetting index allowed in SuperMemo (20%) results in the expected increase in stability that is nearly 80% less than the maximum possible (if we were only ready to sacrifice high retention)

Expected increase in memory stability E(SInc) as a function of retrievability R and S based on Eqn SInc2005

Figure: Expected increase in memory stability E(SInc) as a function of retrievability R and stability S as derived from Eqn SInc2005

Memory complexity in spaced repetition

Memory stability in spaced repetition depends on the quality of review, which depends on memory complexity. As early as in 1984, I used that principle in my own learning in what later became known as the minimum information principle. For effective review, knowledge associations need to be simple (even if knowledge itself is complex). Items may build a complex structure of knowledge, but individual memories subject to review should be atomic.

Memory complexity: simple and complex memories
Memory complexity: simple and complex memories

Figure: Memory complexity illustrates the importance of the minimum information principle. When memorizing simple questions and answers, we can rely on a simple memory connection, and uniformly refresh that connection at review. Complex memories may have their concepts activated in an incomplete fashion, or in a different sequence that depends on the context. As a result, it is hard to produce a uniform increase in memory stability at review. Complex items are difficult to remember. An example of a simple item may be a word pair, e.g. apple = pomo (Esperanto). While a complex net of connection may be needed to recognize an apple. The connection between apple and pomo is irreducible (i.e. maximally simplified)

In 2005, we found a formula that governs the review of complex memories. Georgios Zonnios was once an inquisitive teen user of SuperMemo. Today, he is an education innovator, and a rich creative contributor to many of ideas behind SuperMemo (incl. Neurostatistical Model of Memory). He noticed:

Stability in the formula for stability of complex items is like resistance in an electronic circuit: many parallel resistors allow of leaks in the current

Incidentally, in the early days of incremental reading, Zonnios independently arrived at the concept of incremental writing, which today may seem like an obvious step in employing the tools of incremental reading in creativity. This article has also been written by means of incremental writing.

This is how memories for complex items have been described and analyzed in 2005:

Archive warning: Why use literal archives?
The difficulty in learning is determined by the complexity of remembered information. Complex knowledge results in two effects:
  • increased interference with other pieces of information
  • difficulty in uniform stimulation of memory trace sub-components at review time

Both effects can be counteracted with the application of appropriate representation of knowledge in the learning process.

Let us see how complexity of knowledge affects the build up of memory stability.

Imagine we would like to learn the following: Marie Sklodowska-Curie was a sole winner of the 1911 Nobel Prize for Chemistry. We can take two approaches: one in which knowledge is kept complex, and one with easy formulations. In a complex variant, a double cloze might have been formulated for the purpose of learning the name of Marie Curie and the year in which she received the Nobel Prize.

Q: [...] was a sole winner of the [...] Nobel Prize for Chemistry
A: Marie Sklodowska-Curie, 1911

In a simple variant, this double cloze would be split and the Polish maiden name would be made optional and used to create a third cloze:

Q: [...] was a sole winner of the 1911 Nobel Prize for Chemistry
A: Marie (Sklodowska-)Curie

Q: Marie Sklodowska-Curie was a sole winner of the [...](year) Nobel Prize for Chemistry
A: 1911

Q: Marie [...]-Curie was a sole winner of the 1911 Nobel Prize for Chemistry
A: Sklodowska

In addition, in the simple variant, a thorough approach to learning would require formulating yet two cloze deletions, as Marie Curie was also a winner of 1903 Nobel Prize for Physics (as well as other awards):

Q: Marie Sklodowska-Curie was a sole winner of the 1911 Nobel Prize for [...]
A: Chemistry

Q: Marie Sklodowska-Curie was a sole winner of the 1911 [...]
A: Nobel Prize (for Chemistry)

Let us now consider the original composite double cloze. For the sake of argument, let's assume that remembering the year 1911, and the name Curie is equally difficult. The retrievability of the composite memory trace (i.e. the entire double cloze) will be a product of the retrievability for its subtraces. This comes from the general rule that memory traces, in most cases, are largely independent. Although forgetting one trace may increase the probability of forgetting the other, in a vast majority of cases, as proved by experience, separate and different questions pertaining to the same subject can carry an entirely independent learning process, in which recall and forgetting are entirely unpredictable. Let us see how treating probabilities of recall as independent events affects the stability of a composite memory trace:

(9.1) R=Ra*Rb

where:

  • R - retrievability of a binary composite memory trace
  • Ra and Rb - retrievability of two independent memory trace subcomponents (subtraces): a and b

(9.2) R=exp-kt/Sa*exp-kt/Sb=exp-kt/S

where:

  • t - time
  • k - ln(10/9)
  • S - stability of the composite memory trace
  • Sa and Sb - stabilities of memory subtraces a and b

(9.3) -kt/S=-kt/Sa-kt/Sb=-kt(1/Sa+1/Sb)

(9.4) S=Sa*Sb/(Sa+Sb)

We used the Eqn. (9.4) in further analysis of composite memory traces. We expected, that if initially, the stability of memory subtraces Sa and Sb differed substantially, subsequent repetitions, optimized for maximizing S (i.e. with the criterion R=0.9) might weaken the stability of subcomponents due to sub-optimal timing of review. We showed this not to be the case. Substabilities tend to converge in the learning process!

Value of keeping memories simple
Value of keeping memories simple

Figure: Keeping memories simple in learning is essential (see: Minimum information principle). Complex models of knowledge can be represented by simple memories. Simplicity improves memory retention in the long run. The impact of simplicity on the stability of memory is an important contribution of two component model of memory to proving the need for the existence of grandmother cells. Human intelligence depends on a system of concept maps, which in turn owe their stability to the simplicity of individual memories

Stability of complex memories can be derived from substabilities of atomic memories

The fact that memory traces for complex memories contribute to the difficulty in retaining knowledge in the long-term is a hint that the neocortex cannot possibly use connectionist approach to storing memories. This is an important new argument for the existence of neurons called grandmother cells (for more see: The truth about grandmother cells). The picture below helps understand how memory conceptualization proceeds over time:

Uncertain course of stabilization in complex memories
Uncertain course of stabilization in complex memories

Figure: Uncertain course of the stabilization of complex memories. The picture shows a hypothetical course of stabilization, forgetting, generalization, and interference on the example of a single dendritic input pattern of a single concept cell. The neuron, dendrites and dendritic filipodia are shown in orange. The picture does not show the conversion of filopodia into dendritic spines whose morphology changes over time with stabilization. The squares represent synapses involved in the recognition of the input pattern. Each square shows the status of the synapse in terms of the two component model of long-term memory. The intensity of red represents retrievability. The size of the blue area represents stability. After memorizing a complex memory pattern, the concept cell is able to recognize the pattern upon receiving a summation of signals from the red squares representing a new memory of high retrievability and very low stability. Each time the cell is re-activated, active inputs will undergo stabilization, which is represented by the increase in the blue area in the input square. Each time a signal does not arrive at an input while the concept cell is active, its stability will drop (generalization). Each time a source axon is active and the target neuron fails to fire, the stability will drop as well (competitive interference). Due to the uneven input of signal patterns to the concept cell, some synapses will be stabilized, while others will be lost. Forgetting occurs when a synapse loses its stability and its retrievability and when the relevant dendritic spine is retracted. Generalization occurs when the same concept cell can be re-activated using a smaller, but a more stable input pattern. Retroactive interference occurs when a new input pattern contributes to forgetting some of the redundant inputs necessary for the recognition of the old input pattern. Stabilization of the old patterns results in the reduced mobility of filopodia, which prevents the takeover of a concept by new patterns (proactive interference). At the every end of the process, a stable and a well-generalized input pattern is necessary and sufficient to activate the concept cell. The same cell can respond to different patterns as long as they are consistently stabilized. In spaced repetition, poor choice of knowledge representation will lead to poor reproducibility of the activation pattern, unequal stabilization of synapses, and forgetting. Forgetting of an item will occur when the input pattern is unable to activate sufficiently many synapses and thus unable to reactivate the concept cell. At repetition, depending on the context and the train of thought, an item may be retrieved or forgotten. The outcome of the repetition is uncertain

Convergence of sub-stabilities for composite memory traces

It was easy to simulate the behavior of complex memories in spaced repetition. Their substabilities tend to converge. This leads to inefficient review and a slow buildup of stability. Today we can show that at a certain level of complexity, it is no longer possible to build memory stability for long-term retention. In short, there is no way to remember a book other than just re-reading it endlessly. This is a futile process.

Archive warning: Why use literal archives?
If we generate a double-cloze, we are not really sure if a single repetition generates a uniform activation of both memory circuits responsible for storing the two distinct pieces of knowledge. Let us assume that the first repetition is the only differentiating factor for the two memory traces, and that the rest of the learning process proceeds along the formulas presented above.

To investigate the behavior of stability of memory subtraces under a rehearsal pattern optimized for composite stability with the criterion R=0.9, let us take the following:

  • Sa=1
  • Sb=30
  • S=Sa*Sb/(Sa+Sb) (from Eqn. 9.4)
  • SInc=aS-b*ecR+d (from Eqn. SInc2005)
  • composite memory trace is consolidated through rehearsal with R=0.9 so that both subtraces are equally well re-consolidated (i.e. the review of the composite trace is to result in no neglect of subtraces)

As can be seen in the following figure, memory stability for the composite trace will always be less than the stability for individual subtraces; however, the stabilities of subtraces converge.

Convergence of stability for memory sub-traces rehearsed with the same review pattern optimized for the entire composite memory trace (i.e. review occurs when the composite retrievability reaches 0.9)

Figure: Convergence of stability for memory sub-traces rehearsed with the same review pattern optimized for the entire composite memory trace (i.e. review occurs when the composite retrievability reaches 0.9). The horizontal axis represents the number of reviews, while the vertical axis shows the logarithm of stability. Blue and red lines correspond with the stability of two sub-traces which substantially differed in stability after the original learning. The black line corresponds with the composite stability (S=Sa*Sb/(Sa+Sb)). The disparity between Sa and Sb self-corrects if each review results in a uniform activation of the underlying synaptic structure.

Composite stability increase

Archive warning: Why use literal archives?
Let us now figure out how much SInc differs for composite stability S and for subtrace stabilities Sa and Sb? If we assume the identical stimulation of memory subtraces, and denote SInca and SIncb as i, then for repetition number r, we have:

SInca=SIncb=i

Sa[r]=Sa[r-1]*i
Sb[r]=Sb[r-1]*i

S[r]=Sa[r]*Sb[r]/(Sa[r]+Sb[r])=
=Sa[r-1]*Sb[r-1]*i2/(Sa[r-1]*i+Sb[r-1]*i)=
=i*(Sa[r-1]*Sb[r-1])/(Sa[r-1]+Sb[r-1)=i*S[r-1]

In other words:

(11.1) SInc=i=SInca=SIncb

The above demonstrates that with the presented model, the increase in memory stability is independent of the complexity of knowledge assuming equal re-consolidation of memory subtraces
Composite stability increase is the same as the increase in stability of sub-traces