Search for a universal memory formula
This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)
Optimum review vs. intermittent review
By 1990, I had no doubt. I had a major discovery at hands. I cracked the problem of forgetting. I knew the optimum timing of review for simple memories. Once I secured the permission to describe my findings in my Master's Thesis, my appetite for discovery kept growing. I hoped I might find a universal formula for long-term memory. A formula that would help me track the behavior of memory for any pattern of exposure or retrieval.
I already had a collection of data that might help me find the formula. Before discovering the optimum spacing of repetitions in 1985, I used pages of questions for review of knowledge. The review was chaotic and determined by the availability of time, the need, or the mood. I called that "intermittent learning". I had recall data for individual pages and for each review. That was the ideal kind of data that did not have the periodicity of SuperMemo. The exact kind of data needed to solve the problem of memory. However, I had that data on paper only.
In Spring 1990, I recruited my sister to do the typing. No. I do not have a younger sister who would do that eagerly. My sister was 17 years my senior. Being a bit inconsiderate for her time, I used her love to make her do the donkey work. I feel guilty about it. She died just two years later. I never had a chance to repay her contribution to the theory of spaced repetition, which she never even had a chance to understand. Starting on May 1, 1990, she used my time away from the PC to transfer the data from paper to the computer. It took her many days of slow typing. It was worth it.
Model of intermittent learning
Throughout the summer of 1990, instead of focusing on my Master's Thesis, I worked on the "model of intermittent learning". It was not unusual for me to work for 10 hours straight, or go to sleep at 7 am empty-handed, or leave the computer churning the numbers overnight.
Persistence and tinkering pay. Only teens can afford it and should be given the space and the freedom. Despite being 28 years old, I was being tolerated at home pretty well. Like an immature teen. I lived at my sister's apartment where I could leech on her kindness. Long hours at the computer were excused as "working on my Master's Thesis". The truth was nobody asked me to do it, nobody demanded it, it did not even push SuperMemo much ahead. It was a sheer case of scientific curiosity. I just wanted to know how memory works.
I had dozens of pages of questions and their repetition history. I tried to predict "memory lapses per page". I used root-mean-square deviation for lapse prediction (denoted below as Dev). By Jul 10, 1990, Tuesday, I reached Dev<3 and felt like the problem was almost "solved." On Jul 12, 1990, I improved to Dev=2.877 (incidentally, my Thesis speaks of 2.887241). However, by Aug 27, 1990, I declared the problem unsolvable. My notes from that day say:
On Aug 30 1990, I decribed the model for my Master's Thesis. The text covered 15 pages that don't make for a good reading. I bet nobody has ever had the patience to read this all. That chapter was not even published at supermemo.com when my Master's Thesis was put on-line in excerpts in the late 1990s.
However, the conclusions drawn on the basis of the model had a profound effect on my thinking about memory in the decades that followed. The whole idea behind the model is actually reminiscent of the optimizations used to deliver Algorithm SM-17 (2014-2016).
When I declared the problem unsolvable, I meant that I could not accurately describe the memory of "difficult pages" as heterogenous materials require more complex models. However, Aug 31, 1990 notes sound far more optimistic:
- optimal factors decrease with successive intervals (previously I had an intuition that it is so),
- for the forgetting index equal 10% the retention is 94% (as in the EVF database)
- retention is in a linear relation to the forgetting index [comment 2018: in a small range for heterogeneous material] (this could not be calculated from my simulation experiments carried in January)
- the model says that the desirable value of the forgetting index is 5-10% (workload-retention trade-off)
- strength of memory increases most if the interval is twice as long as the optimal one!!!
- the strength of memory increases most if the forgetting index is 20%.
Past (1990) vs. Present (2018)
Conclusions at the end of the chapter, and the procedure itself are reminiscent of the methodology I used in 2005 when looking for the universal formula for memory stability increase, and in 2014, when Algorithm SM-17 was based on a far more accurate mathematical description of memory. Like the newest SuperMemo algorithm, the model made it possible to compute retention for any repetition schedule. Naturally, it was far less accurate as it was based on inferior data. Moreover, what SuperMemo 17 does in real time, it took many hours of PC computer time back in 1990.
This old seemingly boring portion of my Master's Thesis has then grown in importance by now. I dare say that only inferior data separated that work from Algorithm SM-17 that emerged long 25 years later. I quote the text with minor notational and stylistic improvements without the chapter about forgetting curves that was erroneous due to highly heterogeneous material used in computations:
Model of intermittent learning
The SuperMemo model provides a basis for the calculation of optimal intervals that should separate repetitions in the process of time optimal learning.
However, it does not allow to predict the changes of memory variables if repetitions are done in irregular intervals.
Below I present an attempt to augment the SuperMemo model so that it can be used in the description of the process of intermittent learning.
In Chapter 3, I mentioned the way, in which I had learned English and biology before the Algorithm SM-0 was developed.
Data collected during that time (1982-1984) provide an excellent basis for the construction of the model of intermittent learning. Items, formulated in compliance with the minimum information principle (usually having the form of pairs of words) were grouped in pages subject to the irregular review process.
The collected data, available in the computer readable form, include the description of repetitions of 71 pages, and in addition, 80 similar pages participating in a process supervised by the SM-0 scheduleSimilarity to Algorithm SM-17
Note that the formulation of the problem is reminiscent of the procedure used to compute the stability increase matrix (SInc[]) in Algorithm SM-17. Memory stability was rescaled to make it possible to interpret it as an interval. Even the symbols are similar: S for stability and D for deviation. Page lapses substituted for retrievability.
I loved playing with various optimization algorithms. You can still visually observe in SuperMemo 17 how the algorithm runs surface fitting optimizations (see picture). Doing it with 12 variables might have been a bit inefficient, but I never cared about the method as long as I got interesting results that provided new insights into how memory works.
For those familiar with Algorithm SM-17, we changed the notation in the text below. In addition, we changed symbols such as In and Ln that in print could easily be misread as logarithms.
The list of changes:
- Ln -> Lapsn
- In -> Intn
- Dn -> Devn
- R -> RepNo
Formulation of the problem of intermittent learning
11.1. Formulation of the problem of intermittent learning
- There are 161 pages.
- Each page contains about 40 items.
- For each page, the description of the learning process (collected during experimental repetitions) has the following form:
- ((-,Laps1),(Int2,Laps2),(Int3,Laps3), ...,(Intn,Lapsn))
- where:
- Inti - inter-repetition interval used before the i-th repetition (it ranges from 1 to 800),
- Lapsi - number of lapses of memory during the i-th repetition (it ranges from 0 to 40),
- n - total number of repetitions (it ranges from 3 to 20).
- Find the functions f and g described by the formulas:
- S(1)=S1
- S(n)=f(S(n-1),Intn,Lapsn)
- Laps(n)=g(S(n-1),Intn)
- where:
- S(n) - any variable corresponding to the strength of memory after the n-th repetition (compare Chapter 10),
- Intn - interval used before the n-th repetition; taken from data collected during intermittent learning,
- Lapsn - number of memory lapses in the n-th repetition; taken from data collected during intermittent learning,
- Laps(n) - estimation of the number of memory lapses in the n-th repetition (it should correspond with Lapsn)
- S1 - a constant,
- so that to minimize the function Dev:
- Dev=sqrt((Dev1+Dev2+ ... +Dev161)/RepNo)
- Devi=sqr(Laps(1)-Laps1)+sqr(Laps(2)-Laps2)+ ... +sqr(Laps(n)-Lapsn))
- where:
- Dev - function that describes the difference between values yielded by the functions f and g, and values collected during intermittent learning (it reflects the difference between experimental and theoretically predicted data)
- RepNo - total number of repetitions recorded on all pages
- Devi - component of the function Dev describing the deviation for the i-th page,
- Laps(j) - number of lapses calculated for the i-th page and j-th repetition using the functions f and g,
- Lapsj - number of lapses of memory for the i-th page and j-th repetition; taken from data collected during intermittent learning,
- sqrt(x) - square root of x,
- sqr(x) - second power of x.
Solution to the problem of intermittent learning
11.2. Solution to the problem of intermittent learning
In the search for functions f and g that minimize the value of Dev, I used a numerical minimization procedure described in Wozniak, 1988b (A new algorithm for finding local maxima of a function within the feasible region. Credit paper).
Exemplary functions used in the search could look as follows:
S(1)=x[1]
S(n)=x[2]*Intn*exp(-Lapsn*x[3])+x[4])
Laps(n)=x[5]*(1-exp(-Intn/S(n-1)))
where:
- x[i] - variables that are computed by the minimization procedure,
- S(n), Laps(n), Lapsn and Intn - as defined in 11.1.
Note, that the function f describing S(n) does not use S(n-1) as its argument (the formulation of the problem allows, but does not require, that the new strength be calculated on the base of the previous strength).
In order to retain simplicity and save time, I set a limit of 12 variables used in the process of minimization.
I tested a great gamut of mathematical functions constructed in accordance with obvious intuitions concerning memory (e.g. that with time passing by, the number of lapses of memory will increase).
These included exponential, logarithmic, power, hyperbolic, sigmoidal, bell-shaped, polynomial and reasonable combinations thereof.
In most cases, the minimization procedure reduced the value of Dev to less than 3, and functions f and g assumed similar shape independent of their nature.
The lowest value of Dev obtained with the use of fewer than 12 variables was 2.887241.
The functions f and g were as follows:
constant S(1)=0.2104031; function Sn(Intn,Lapsn,S(n-1)); begin S(n):=0.4584914*(Intn+1.47)*exp(-0.1549229*Lapsn-0.5854939)+0.35; if Lapsn=0 then if S(n-1)>In then S(n):=S(n-1)*0.724994 else S(n):=Intn*1.1428571; end; function Lapsn(Intn,S(n-1)); var quot; begin quot:=(Intn-0.16)/(S(n-1)-0.02)+1.652668; Lapsn:=-0.0005408*quot*quot+0.2196902*quot+0.311335; end;
Without significantly changing the value of Dev, these functions can be easily converted to the following form:
S(1)=1
for Intn>S(n-1): S(n)=1.5*Intn*exp(-0.15*Lapsn)+1
Laps(n)=Intn/S(n-1)
Note that:
- particular elements of the function where dropped or rounded whenever the operation did not considerably affect the value of Dev,
- strength was rescaled to allow it to be interpreted as an interval for which the number of lapses equals 1 and the forgetting index equals 2.5% (there are 40 items on a page and 1/40=2.5%),
- the formula for strength can only be valid if Intn is not much less than S(n-1). This is because of the fact that the value S(n-1) must be used in calculation of S(n) if the number of lapses is low, e.g. for Intn<=S(n-1): S(n)=S(n-1)*(1+0.5/(1-exp(S(n-1))*(1-exp(-Intn)))
- The function g intentionally did not involve S(n-1) to avoid recursive accumulation of errors in calculations for successive repetitions (note, that the formula used does not consider the history of the process),
- the formulas cannot be used to describe any process in which intervals are manifold longer than the optimal ones. This is because of the fact that for Intn->∞ the value of Laps(n) exceeds 100%,
- the formulas describe learning of collective items characterized by more or less uniform distribution of E-Factors. Therefore it cannot be used universally for items of variable difficulty.
Simulations based on the model of intermittent learning
With the formula found above, I could run a whole series of simulation experiments that would help me answer many hypothetical scenarios on the behavior of memory in various circumstances. Those simulations shaped the progress of SuperMemo for many years to follow. In particular, the trade-off between workload and retention played a major role in optimization of learning as of SuperMemo 6 (1991). Until this day, it is the forgetting index (or retrievability) that provide the guiding criterion in learning, not the intuitively natural increase in memory stability that may occur at lower levels of recall. Set level of memory lapses played the role of the forgetting index below.
11.4. Verification of the model of intermittent learning
To verify the consistency of the model of intermittent learning with the SuperMemo theory, let us try to calculate optimal intervals that should separate repetitions.
The optimal interval will be determined by the moment at which the number of lapses reaches a selected value Lapso.
The algorithm proceeds as follows:
- i:=1
- S(i):=1
- Find Int(i+1) such that Laps(i+1) equals Lapso. Use the formula:
- Int(n)=Lapso*S(n-1) (taken from IL model)
- where:
- Int(n) denotes the n-1 optimal interval.
- i:=i+1
- S(i):=1.5*Int(i)*exp(-0.15*Lapso)+1 (taken from the IL model)
- goto 3
If Lapso equals 2.5 (forgetting index 6.25%) and the exact variant of the model of intermittent learning is used then an amazing correspondence can be observed (compare the experiment presented on page 16, Chapter 3.1):
- Rep - number of the repetition
- Interval - optimal interval preceding the repetition, determined by Lapso=2.5 on the base of the IL model,
- Factor - optimal factor equal to the quotient of the optimal interval and previously used optimal interval,
- SM-0 - optimal interval calculated on the base of experiments leading to the algorithm SM-0
Rep | Interval | Factor | SM-0 |
---|---|---|---|
2 | 1.8 | 1 | |
3 | 7.8 | 4.36 | 7 |
4 | 16.8 | 2.15 | 16 |
5 | 30.4 | 1.80 | 35 |
6 | 50.4 | 1.66 | |
7 | 80.2 | 1.59 | |
8 | 124 | 1.55 | |
9 | 190 | 1.53 | |
10 | 288 | 1.52 | |
11 | 436 | 1.51 | |
12 | 654 | 1.50 | |
13 | 981 | 1.50 | |
14 | 1462 | 1.49 | |
15 | 2179 | 1.49 | |
16 | 3247 | 1.49 | |
17 | 4838 | 1.49 | |
18 | 7209 | 1.49 |
Obviously, the exact correspondence, to some extent, is a coincidence because the experiment leading to the formulation of the algorithm SM-0 was not that sensitive.
It is worth noticing, that optimal factors tend to decrease gradually! This fact seems to confirm recent observations based on the analysis of the matrix of optimal factors used in Algorithm SM-5.
If Lapso equals 4 (forgetting index 10%, as in Algorithm SM-5) then the sequence of optimal factors resembles a column of the OF matrix in Algorithm SM-5. Also the knowledge retention matches almost ideally the one found in SM-5 databases.
Rep | Interval | Retention | Factor |
---|---|---|---|
2 | 3 | 93.21678 | |
3 | 16 | 93.80946 | 4.89 |
4 | 43 | 93.97184 | 2.74 |
5 | 102 | 94.04083 | 2.39 |
6 | 232 | 94.06886 | 2.27 |
7 | 517 | 94.08418 | 2.23 |
8 | 1138 | 94.09256 | 2.20 |
9 | 2502 | 94.09737 | 2.20 |
10 | 5481 | 94.09967 | 2.19 |
The value of retention was obtained by averaging its value calculated for each day of the optimal process:
R=(R(1)+R(2)+...+R(n))/n
R(d)=100-2.5*Laps(d-dlr)
where:
- R - average retention
- R(d) - retention on the d-th day of the process
- Laps(Int) - expected number of lapses after the interval I
- dlr - day of the process on which the last repetition was scheduled
Workload vs. Retention trade-off
Despite the inaccuracies coming from the heterogeneous material, solid conclusions could be drawn about the impact of the forgetting index on the amount of time needed to invest in learning. Those observations survived the test of time:
Very interesting conclusions may be drawn by comparison of retention and workload data calculated by means of the model of intermittent learning:
- Index - forgetting index (Lapso*2.5) determining optimal intervals in the process of time optimal learning scheduled with the use of the IL model
- Retention - overall retention obtained while using the given forgetting index (calculated upon elapse of 10,000 days)
- Repetitions - number of repetitions scheduled in the first 10,000 days of the process when using the given forgetting index,
- Factor - asymptotic value of the optimal factor (taken from the 10,000-th day of the process)
Index | Retention | Repetitions | Factor |
---|---|---|---|
2.5 | 97.76 | every 2 days | 1.0000 |
4.5 | 96.88 | 65 | 1.0300 |
5.0 | 96.64 | 30 | 1.1600 |
5.5 | 96.39 | 22 | 1.3000 |
6.25 | 96.01 | 17 | 1.4900 |
7.5 | 95.37 | 13 | 1.7700 |
10.0 | 94.10 | 10 | 2.1900 |
12.5 | 92.78 | 9 | 2.4700 |
Figure 11.2 demonstrates that the forgetting index used in determination of optimal intervals should fall into the range 5 to 10%.
Fig. 11.2. Workload-retention trade-off: On one hand, if forgetting index is lower than 5%, then the workload increases dramatically without substantially affecting the retention. On the other, above forgetting index of 10%, workload hardly changes while retention steadily falls down. Obviously, the workload-retention trade-off corresponds directly to the compromise between the acquisition rate and retention. By increasing the availability of time X times (by decreasing the workload X times), one can increase the acquisition rate X times (compare Chapter 5). Note, that the relation of the forgetting index and retention in this model is almost linear. (source: Optimization of learning: Model of intermittent learning, Piotr Wozniak, 1990)
Another important observation comes from the calculation of the forgetting index for which the increase of strength is the greatest.
From the model of intermittent learning it follows that
S(n)=1.5*Laps(n)*S(n-1)*exp(-0.15*Laps(n))+1
Upon differentiation for the variable Laps(n) we arrive at:
S'(n)=1.5*S(n-1)*exp(-0.15*Laps(n))*(1-0.15*Laps(n))
Finally, after equating with zero, we obtain:
which corresponds to the forgetting index equal to 20%! Such a forgetting index is equivalent to intervals 2 times longer than the optimal ones determined by the index equal 10% (as in Algorithm SM-5). However, it must not be forgotten that it is the knowledge retention and not the strength of memory that is the only factor traded for workload. Therefore the above finding does not abolish the validity of Algorithm SM-5Laps(n)=7.8
Conclusions: model of intermittent learning
The ultimate conclusions drawn at the end of the chapter stood the test of 3 decades. Only the claim on non-exponential shape of forgetting curves is inaccurate. As the entire model was based on heterogeneous data, the exponential nature of forgetting could not have been revealed.
Interim summary
- The model of intermittent learning was constructed making it possible to estimate knowledge retention upon different repetition schedules
- The model strongly indicates that the forgetting curve is not exponential [comment 2018: wrong conclusion: compare Exponential nature of forgetting]
- The model satisfactorily corresponds to experimental data
- With a striking accuracy, the model approximates optimal intervals and knowledge retention implied by the SuperMemo model
- The model indicates that optimal factors decrease in successive repetitions and asymptotically approach the ultimate value
- The model indicates that the desirable value of the forgetting index used in time optimal learning should fall into the range 5% to 10%
- The model indicates an almost linear relation of the forgetting index and knowledge retention
- The model shows that the greatest increase of the strength of memory occurs when intervals are approximately 2 times longer than that used in the SuperMemo method. This is equivalent to the forgetting index equal to 20%