Search for a universal memory formula

From supermemo.guru
Jump to navigation Jump to search

This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)

Optimum review vs. intermittent review

By 1990, I had no doubt. I had a major discovery at hands. I cracked the problem of forgetting. I knew the optimum timing of review for simple memories. Once I secured the permission to describe my findings in my Master's Thesis, my appetite for discovery kept growing. I hoped I might find a universal formula for long-term memory. A formula that would help me track the behavior of memory for any pattern of exposure or retrieval.

I already had a collection of data that might help me find the formula. Before discovering the optimum spacing of repetitions in 1985, I used pages of questions for review of knowledge. The review was chaotic and determined by the availability of time, the need, or the mood. I called that "intermittent learning". I had recall data for individual pages and for each review. That was the ideal kind of data that did not have the periodicity of SuperMemo. The exact kind of data needed to solve the problem of memory. However, I had that data on paper only.

In Spring 1990, I recruited my sister to do the typing. No. I do not have a younger sister who would do that eagerly. My sister was 17 years my senior. Being a bit inconsiderate for her time, I used her love to make her do the donkey work. I feel guilty about it. She died just two years later. I never had a chance to repay her contribution to the theory of spaced repetition, which she never even had a chance to understand. Starting on May 1, 1990, she used my time away from the PC to transfer the data from paper to the computer. It took her many days of slow typing. It was worth it.

Model of intermittent learning

Throughout the summer of 1990, instead of focusing on my Master's Thesis, I worked on the "model of intermittent learning". It was not unusual for me to work for 10 hours straight, or go to sleep at 7 am empty-handed, or leave the computer churning the numbers overnight.

Persistence and tinkering pay. Only teens can afford it and should be given the space and the freedom. Despite being 28 years old, I was being tolerated at home pretty well. Like an immature teen. I lived at my sister's apartment where I could leech on her kindness. Long hours at the computer were excused as "working on my Master's Thesis". The truth was nobody asked me to do it, nobody demanded it, it did not even push SuperMemo much ahead. It was a sheer case of scientific curiosity. I just wanted to know how memory works.

I had dozens of pages of questions and their repetition history. I tried to predict "memory lapses per page". I used root-mean-square deviation for lapse prediction (denoted below as Dev). By Jul 10, 1990, Tuesday, I reached Dev<3 and felt like the problem was almost "solved." On Jul 12, 1990, I improved to Dev=2.877 (incidentally, my Thesis speaks of 2.887241). However, by Aug 27, 1990, I declared the problem unsolvable. My notes from that day say:

Personal anecdote. Why use anecdotes?
Aug 27, 1990: I solved the problem of intermittent learning showing that it is unsolvable! One parameter is not able to describe the strength of memory related to the whole page of items. This shows that there are no optimal intervals for items with low E-factors!

On Aug 30 1990, I decribed the model for my Master's Thesis. The text covered 15 pages that don't make for a good reading. I bet nobody has ever had the patience to read this all. That chapter was not even published at supermemo.com when my Master's Thesis was put on-line in excerpts in the late 1990s.

However, the conclusions drawn on the basis of the model had a profound effect on my thinking about memory in the decades that followed. The whole idea behind the model is actually reminiscent of the optimizations used to deliver Algorithm SM-17 (2014-2016).

When I declared the problem unsolvable, I meant that I could not accurately describe the memory of "difficult pages" as heterogenous materials require more complex models. However, Aug 31, 1990 notes sound far more optimistic:

Personal anecdote. Why use anecdotes?
Aug 31, 1990: Non-stop work on the intermittent learning model. By night, the computer did not manage to get me closer to the solution. However, I had a great idea to calculate optimal intervals using the record-breaking function of the IL model. When I saw the results on the screen I could not believe my [bleep] eyes. These were exactly the same intervals, which I found in 1985 while trying to formulate the SuperMemo method. I was happy like a dog with two tails jumping around the house. So I can say that I really solved the IL problem (compare August 27, 1990). But this success was not everything I was given to discover today. I found that: [...] my formulas work only when intervals are not much shorter than the previous strength.

Past (1990) vs. Present (2018)

Conclusions at the end of the chapter, and the procedure itself are reminiscent of the methodology I used in 2005 when looking for the universal formula for memory stability increase, and in 2014, when Algorithm SM-17 was based on a far more accurate mathematical description of memory. Like the newest SuperMemo algorithm, the model made it possible to compute retention for any repetition schedule. Naturally, it was far less accurate as it was based on inferior data. Moreover, what SuperMemo 17 does in real time, it took many hours of PC computer time back in 1990.

This old seemingly boring portion of my Master's Thesis has then grown in importance by now. I dare say that only inferior data separated that work from Algorithm SM-17 that emerged long 25 years later. I quote the text with minor notational and stylistic improvements without the chapter about forgetting curves that was erroneous due to highly heterogeneous material used in computations:

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

Model of intermittent learning

The SuperMemo model provides a basis for the calculation of optimal intervals that should separate repetitions in the process of time optimal learning.

However, it does not allow to predict the changes of memory variables if repetitions are done in irregular intervals.

Below I present an attempt to augment the SuperMemo model so that it can be used in the description of the process of intermittent learning.

In Chapter 3, I mentioned the way, in which I had learned English and biology before the Algorithm SM-0 was developed.

Data collected during that time (1982-1984) provide an excellent basis for the construction of the model of intermittent learning. Items, formulated in compliance with the minimum information principle (usually having the form of pairs of words) were grouped in pages subject to the irregular review process.

The collected data, available in the computer readable form, include the description of repetitions of 71 pages, and in addition, 80 similar pages participating in a process supervised by the SM-0 schedule

Similarity to Algorithm SM-17

Note that the formulation of the problem is reminiscent of the procedure used to compute the stability increase matrix (SInc[]) in Algorithm SM-17. Memory stability was rescaled to make it possible to interpret it as an interval. Even the symbols are similar: S for stability and D for deviation. Page lapses substituted for retrievability.

I loved playing with various optimization algorithms. You can still visually observe in SuperMemo 17 how the algorithm runs surface fitting optimizations (see picture). Doing it with 12 variables might have been a bit inefficient, but I never cared about the method as long as I got interesting results that provided new insights into how memory works.

For those familiar with Algorithm SM-17, we changed the notation in the text below. In addition, we changed symbols such as In and Ln that in print could easily be misread as logarithms.

The list of changes:

  • Ln -> Lapsn
  • In -> Intn
  • Dn -> Devn
  • R -> RepNo

Formulation of the problem of intermittent learning

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

11.1. Formulation of the problem of intermittent learning

  1. There are 161 pages.
  2. Each page contains about 40 items.
  3. For each page, the description of the learning process (collected during experimental repetitions) has the following form:

  4. ((-,Laps1),(Int2,Laps2),(Int3,Laps3), ...,(Intn,Lapsn))

    where:
    • Inti - inter-repetition interval used before the i-th repetition (it ranges from 1 to 800),
    • Lapsi - number of lapses of memory during the i-th repetition (it ranges from 0 to 40),
    • n - total number of repetitions (it ranges from 3 to 20).

  5. Find the functions f and g described by the formulas:

  6. S(1)=S1

    S(n)=f(S(n-1),Intn,Lapsn)

    Laps(n)=g(S(n-1),Intn)

    where:
    • S(n) - any variable corresponding to the strength of memory after the n-th repetition (compare Chapter 10),
    • Intn - interval used before the n-th repetition; taken from data collected during intermittent learning,
    • Lapsn - number of memory lapses in the n-th repetition; taken from data collected during intermittent learning,
    • Laps(n) - estimation of the number of memory lapses in the n-th repetition (it should correspond with Lapsn)
    • S1 - a constant,

    so that to minimize the function Dev:

    Dev=sqrt((Dev1+Dev2+ ... +Dev161)/RepNo)

    Devi=sqr(Laps(1)-Laps1)+sqr(Laps(2)-Laps2)+ ... +sqr(Laps(n)-Lapsn))

    where:
    • Dev - function that describes the difference between values yielded by the functions f and g, and values collected during intermittent learning (it reflects the difference between experimental and theoretically predicted data)
    • RepNo - total number of repetitions recorded on all pages
    • Devi - component of the function Dev describing the deviation for the i-th page,
    • Laps(j) - number of lapses calculated for the i-th page and j-th repetition using the functions f and g,
    • Lapsj - number of lapses of memory for the i-th page and j-th repetition; taken from data collected during intermittent learning,
    • sqrt(x) - square root of x,
    • sqr(x) - second power of x.

Note that functions f and g will provide a basis for valuable biological considerations only if they are simple and defined by a limited number of parameters (e.g. a*ln()+b or a*exp()+b etc.). Otherwise, one could always construct a gigantic, meaningless formula to automatically put Dev to zero.

Solution to the problem of intermittent learning

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

11.2. Solution to the problem of intermittent learning

In the search for functions f and g that minimize the value of Dev, I used a numerical minimization procedure described in Wozniak, 1988b (A new algorithm for finding local maxima of a function within the feasible region. Credit paper).

Exemplary functions used in the search could look as follows:

S(1)=x[1]

S(n)=x[2]*Intn*exp(-Lapsn*x[3])+x[4])

Laps(n)=x[5]*(1-exp(-Intn/S(n-1)))


where:

  • x[i] - variables that are computed by the minimization procedure,
  • S(n), Laps(n), Lapsn and Intn - as defined in 11.1.

Note, that the function f describing S(n) does not use S(n-1) as its argument (the formulation of the problem allows, but does not require, that the new strength be calculated on the base of the previous strength).

In order to retain simplicity and save time, I set a limit of 12 variables used in the process of minimization.

I tested a great gamut of mathematical functions constructed in accordance with obvious intuitions concerning memory (e.g. that with time passing by, the number of lapses of memory will increase).

These included exponential, logarithmic, power, hyperbolic, sigmoidal, bell-shaped, polynomial and reasonable combinations thereof.

In most cases, the minimization procedure reduced the value of Dev to less than 3, and functions f and g assumed similar shape independent of their nature.

The lowest value of Dev obtained with the use of fewer than 12 variables was 2.887241.

The functions f and g were as follows:

constant S(1)=0.2104031;

function Sn(Intn,Lapsn,S(n-1));
begin
    S(n):=0.4584914*(Intn+1.47)*exp(-0.1549229*Lapsn-0.5854939)+0.35;
    if Lapsn=0 then
        if S(n-1)>In then
            S(n):=S(n-1)*0.724994
        else 
            S(n):=Intn*1.1428571;
end;

function Lapsn(Intn,S(n-1));
var quot;
begin
    quot:=(Intn-0.16)/(S(n-1)-0.02)+1.652668;
    Lapsn:=-0.0005408*quot*quot+0.2196902*quot+0.311335;
end;

Without significantly changing the value of Dev, these functions can be easily converted to the following form:

S(1)=1

for Intn>S(n-1): S(n)=1.5*Intn*exp(-0.15*Lapsn)+1

Laps(n)=Intn/S(n-1)

Note that:

  • particular elements of the function where dropped or rounded whenever the operation did not considerably affect the value of Dev,
  • strength was rescaled to allow it to be interpreted as an interval for which the number of lapses equals 1 and the forgetting index equals 2.5% (there are 40 items on a page and 1/40=2.5%),
  • the formula for strength can only be valid if Intn is not much less than S(n-1). This is because of the fact that the value S(n-1) must be used in calculation of S(n) if the number of lapses is low, e.g. for Intn<=S(n-1): S(n)=S(n-1)*(1+0.5/(1-exp(S(n-1))*(1-exp(-Intn)))
The function g intentionally did not involve S(n-1) to avoid recursive accumulation of errors in calculations for successive repetitions (note, that the formula used does not consider the history of the process),
  • the formulas cannot be used to describe any process in which intervals are manifold longer than the optimal ones. This is because of the fact that for Intn->∞ the value of Laps(n) exceeds 100%,
  • the formulas describe learning of collective items characterized by more or less uniform distribution of E-Factors. Therefore it cannot be used universally for items of variable difficulty.
As for now, the above formulas make up the best description of the process of intermittent learning, and will later be referred to as the model of intermittent learning (IL model for short)

Simulations based on the model of intermittent learning

With the formula found above, I could run a whole series of simulation experiments that would help me answer many hypothetical scenarios on the behavior of memory in various circumstances. Those simulations shaped the progress of SuperMemo for many years to follow. In particular, the trade-off between workload and retention played a major role in optimization of learning as of SuperMemo 6 (1991). Until this day, it is the forgetting index (or retrievability) that provide the guiding criterion in learning, not the intuitively natural increase in memory stability that may occur at lower levels of recall. Set level of memory lapses played the role of the forgetting index below.

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

11.4. Verification of the model of intermittent learning

To verify the consistency of the model of intermittent learning with the SuperMemo theory, let us try to calculate optimal intervals that should separate repetitions.

The optimal interval will be determined by the moment at which the number of lapses reaches a selected value Lapso.

The algorithm proceeds as follows:

  1. i:=1
  2. S(i):=1
  3. Find Int(i+1) such that Laps(i+1) equals Lapso. Use the formula:

    Int(n)=Lapso*S(n-1) (taken from IL model)

    where:
    Int(n) denotes the n-1 optimal interval.
  4. i:=i+1
  5. S(i):=1.5*Int(i)*exp(-0.15*Lapso)+1 (taken from the IL model)
  6. goto 3

If Lapso equals 2.5 (forgetting index 6.25%) and the exact variant of the model of intermittent learning is used then an amazing correspondence can be observed (compare the experiment presented on page 16, Chapter 3.1):

  • Rep - number of the repetition
  • Interval - optimal interval preceding the repetition, determined by Lapso=2.5 on the base of the IL model,
  • Factor - optimal factor equal to the quotient of the optimal interval and previously used optimal interval,
  • SM-0 - optimal interval calculated on the base of experiments leading to the algorithm SM-0
Rep Interval Factor SM-0
2 1.8 1
3 7.8 4.36 7
4 16.8 2.15 16
5 30.4 1.80 35
6 50.4 1.66
7 80.2 1.59
8 124 1.55
9 190 1.53
10 288 1.52
11 436 1.51
12 654 1.50
13 981 1.50
14 1462 1.49
15 2179 1.49
16 3247 1.49
17 4838 1.49
18 7209 1.49

Obviously, the exact correspondence, to some extent, is a coincidence because the experiment leading to the formulation of the algorithm SM-0 was not that sensitive.

It is worth noticing, that optimal factors tend to decrease gradually! This fact seems to confirm recent observations based on the analysis of the matrix of optimal factors used in Algorithm SM-5.

If Lapso equals 4 (forgetting index 10%, as in Algorithm SM-5) then the sequence of optimal factors resembles a column of the OF matrix in Algorithm SM-5. Also the knowledge retention matches almost ideally the one found in SM-5 databases.

Rep Interval Retention Factor
2 3 93.21678
3 16 93.80946 4.89
4 43 93.97184 2.74
5 102 94.04083 2.39
6 232 94.06886 2.27
7 517 94.08418 2.23
8 1138 94.09256 2.20
9 2502 94.09737 2.20
10 5481 94.09967 2.19

The value of retention was obtained by averaging its value calculated for each day of the optimal process:

R=(R(1)+R(2)+...+R(n))/n

R(d)=100-2.5*Laps(d-dlr)


where:

  • R - average retention
  • R(d) - retention on the d-th day of the process
  • Laps(Int) - expected number of lapses after the interval I
  • dlr - day of the process on which the last repetition was scheduled

Workload vs. Retention trade-off

Despite the inaccuracies coming from the heterogeneous material, solid conclusions could be drawn about the impact of the forgetting index on the amount of time needed to invest in learning. Those observations survived the test of time:

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

Very interesting conclusions may be drawn by comparison of retention and workload data calculated by means of the model of intermittent learning:

  • Index - forgetting index (Lapso*2.5) determining optimal intervals in the process of time optimal learning scheduled with the use of the IL model
  • Retention - overall retention obtained while using the given forgetting index (calculated upon elapse of 10,000 days)
  • Repetitions - number of repetitions scheduled in the first 10,000 days of the process when using the given forgetting index,
  • Factor - asymptotic value of the optimal factor (taken from the 10,000-th day of the process)
Index Retention Repetitions Factor
2.5 97.76 every 2 days 1.0000
4.5 96.88 65 1.0300
5.0 96.64 30 1.1600
5.5 96.39 22 1.3000
6.25 96.01 17 1.4900
7.5 95.37 13 1.7700
10.0 94.10 10 2.1900
12.5 92.78 9 2.4700

Figure 11.2 demonstrates that the forgetting index used in determination of optimal intervals should fall into the range 5 to 10%.

Workload-retention trade-off

Fig. 11.2. Workload-retention trade-off: On one hand, if forgetting index is lower than 5%, then the workload increases dramatically without substantially affecting the retention. On the other, above forgetting index of 10%, workload hardly changes while retention steadily falls down. Obviously, the workload-retention trade-off corresponds directly to the compromise between the acquisition rate and retention. By increasing the availability of time X times (by decreasing the workload X times), one can increase the acquisition rate X times (compare Chapter 5). Note, that the relation of the forgetting index and retention in this model is almost linear. (source: Optimization of learning: Model of intermittent learning, Piotr Wozniak, 1990)

Another important observation comes from the calculation of the forgetting index for which the increase of strength is the greatest.

From the model of intermittent learning it follows that

S(n)=1.5*Laps(n)*S(n-1)*exp(-0.15*Laps(n))+1

Upon differentiation for the variable Laps(n) we arrive at:

S'(n)=1.5*S(n-1)*exp(-0.15*Laps(n))*(1-0.15*Laps(n))

Finally, after equating with zero, we obtain:

Laps(n)=7.8

which corresponds to the forgetting index equal to 20%! Such a forgetting index is equivalent to intervals 2 times longer than the optimal ones determined by the index equal 10% (as in Algorithm SM-5). However, it must not be forgotten that it is the knowledge retention and not the strength of memory that is the only factor traded for workload. Therefore the above finding does not abolish the validity of Algorithm SM-5

Conclusions: model of intermittent learning

The ultimate conclusions drawn at the end of the chapter stood the test of 3 decades. Only the claim on non-exponential shape of forgetting curves is inaccurate. As the entire model was based on heterogeneous data, the exponential nature of forgetting could not have been revealed.

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

Interim summary

  • The model of intermittent learning was constructed making it possible to estimate knowledge retention upon different repetition schedules
  • The model strongly indicates that the forgetting curve is not exponential [comment 2018: wrong conclusion: compare Exponential nature of forgetting]
  • The model satisfactorily corresponds to experimental data
  • With a striking accuracy, the model approximates optimal intervals and knowledge retention implied by the SuperMemo model
  • The model indicates that optimal factors decrease in successive repetitions and asymptotically approach the ultimate value
  • The model indicates that the desirable value of the forgetting index used in time optimal learning should fall into the range 5% to 10%
  • The model indicates an almost linear relation of the forgetting index and knowledge retention
  • The model shows that the greatest increase of the strength of memory occurs when intervals are approximately 2 times longer than that used in the SuperMemo method. This is equivalent to the forgetting index equal to 20%