First fast-converging spaced repetition algorithm: Algorithm SM-5

From supermemo.guru
Revision as of 17:38, 2 November 2019 by Woz (talk | contribs) (→‎Random dispersal of intervals)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)

SuperMemo 5

The idea of the matrix of optimum intervals was born on Feb 11, 1989. SuperMemo would keep a collection of optimum intervals for different levels of memory stability and difficulty. It would modify the intervals with the flow of new data. Using this new idea, on Mar 1, 1989, I started using SuperMemo 4 to learn Esperanto. Very early I noticed that the idea needs comprehensive revision. The convergence of the algorithm was excruciatingly slow.

By May 5, 1989, I had a new idea in my mind. This was, in essence, the birth of stability increase function, except, in the optimum review of SuperMemo there would be no retrievability dimension. The new algorithm would use the matrix of optimum factors. Instead of remembering optimum intervals, it would remember how much intervals need to increase depending on memory complexity and current memory stability. Slow convergence of Algorithm SM-4 also inspired the need to randomize intervals (May 20, 1989).

In the meantime, progress of SuperMemo had to be delayed because of school obligations again. With Krzysztof Biedalak, we decided to write a program for developing school tests that could be used with SuperMemo. The project was again used to wriggle out from other obligations in classes with open-minded Dr Katulski who has become a supporter of all-things SuperMemo by now.

I spent summer on practical training in the Netherlands when progress was slow due to various obligations (including compulsory excursions!). One of the chief reasons for slowness was extreme dieting that resulted from the need to save money to pay my PC 1512 debts. I also hoped to earn something extra to buy a hard disk for my computer. All my work was possible thanks to the courtesy of Peter Klijn of the University of Eindhoven. He just gave me his PC for my private use for the whole 2-month period of stay. He did not want my good work over SuperMemo to slow down. It was the first time I could keep all my files on a hard disk, and it felt like a move from an old bike to Tesla Model S.

Only on Oct 16, 1989, the new Algorithm SM-5 was completed and I started using SuperMemo 5. I remarked in my notes: "A great revolution is in the offing". The progress was tremendous indeed.

I had a couple of users of SuperMemo 2 that were ready to upgrade to SuperMemo 5. I demanded only one price: they will start with the matrix of optimum factors initialized to a specific value. This was to validate the algorithm and make sure it carries no preconceived prejudice. All preset optimization matrices would converge nicely and fast. This fact was then used to claim universal convergence, and the algorithm was described as such in the first-ever publication about spaced repetition algorithms. Do not read that publication! It has been emasculated by peer review. Read below to understand Algorithm SM-5.

Algorithm SM-5

SuperMemo uses a simple principle: "use, verify, and correct". After a repetition, a new interval is computed with the help of the OF matrix. The "relevant entry" to compute the interval depends on the repetition (category) and item difficulty. After the interval period elapses, SuperMemo calls for the next repetition. The grade is used to tell SuperMemo how well the interval "performed". If the grade is low, we have reasons to believe that the interval is too long and the OF matrix entry is too high. In such cases, we slightly reduce the OF entry. The relevant entry here is the one that was previously used in computing the interval (i.e. before the interval started). In other words, it is the entry that is (1) used to compute the interval (after n-th repetition) and then (2) used to correct the OF matrix (after the n+1 repetition).

Here is an outline of Algorithm SM-5 as presented in my Master's Thesis:

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

The final formulation of the algorithm used in SuperMemo 5 is presented below (Algorithm SM-5):

  1. Split the knowledge into smallest possible items
  2. With all items associate an E-Factor equal to 2.5
  3. Tabulate the OF matrix for various repetition numbers and E-Factor categories. Use the following formula:
    OF(1,EF):=4
    for n>1 OF(n,EF):=EF
    where:
    • OF(n,EF) - optimal factor corresponding to the n-th repetition and the E-Factor EF
  4. Use the OF matrix to determine inter-repetition intervals:
    I(n,EF)=OF(n,EF)*I(n-1,EF)
    I(1,EF)=OF(1,EF)
    where:
    • I(n,EF) - the n-th inter-repetition interval for an item of a given E-Factor EF (in days)
    • OF(n,EF) - the entry of the OF matrix corresponding to the n-th repetition and the E-Factor EF
  5. After each repetition assess the quality of repetition responses in the 0-5 grade scale (cf. Algorithm SM-2)
  6. After each repetition modify the E-Factor of the recently repeated item according to the formula:
    EF':=EF+(0.1-(5-q)*(0.08+(5-q)*0.02))
    where:
    If EF is less than 1.3 then set EF to 1.3
  7. After each repetition modify the relevant entry of the OF matrix. Exemplary formulas constructed arbitrarily and used in the modification could look like this:
    OF':=OF*(0.72+q*0.07)
    OF:=(1-fraction)*OF+fraction*OF'
    where:
    • OF - new value of the OF entry
    • OF' - auxiliary value of the OF entry used in calculations
    • OF - old value of the OF entry
    • fraction - any number between 0 and 1 (the greater it is the faster the changes of the OF matrix)
    • q - quality of the response in the 0-5 grade scale
    Note that for q=4 the OF does not change. It increases for q>4 and decreases for q<4.
  8. If the quality response was lower than 3 then start repetitions for the item from the beginning without changing the E-Factor
  9. After each repetition session of a given day repeat again all items that scored below four in the quality assessment. Continue the repetitions until all of these items score at least four

In accordance with previous observations, the entries of the OF matrix were not allowed to drop below 1.2. In Algorithm SM-5, by definition, intervals cannot get shorter in successive repetitions. Intervals are at least 1.2 times as long as their predecessors. Changing the E-Factor category increases the next applied interval only as many times as it is required by the corresponding entry of the OF matrix.

Criticism of Algorithm SM-5

Anki manual includes a passage that is unexpectedly critical of Algorithm SM-5 (Apr 2018). The words are particularly surprising as Algorithm SM-5 has never been published in full (the version above is just a rough outline). Despite the fact that the words of criticism were clearly uttered in good will, they hint at the possibility that if Algorithm SM-2 was superior over Algorithm SM-5, perhaps it is also superior over Algorithm SM-17. If that was the case, I would have wasted the last 30 years of research and programming. To this day, Wikipedia "criticises" "SM3+". "SM3+" is a label first used in Anki manual that has been used at dozens of sites on the web (esp. those that prefer to stick with the older algorithm for its simplicity). A comparison between Algorithm SM-2 and Algorithm SM-17 is presented here. Hopefully it leaves no doubt.

Erroneous claim in Anki manual

My Master's Thesis, published in excerpts in 1998 at supermemo.com, included only a rough description of the Algorithm SM-5. For the sake of clarity, dozens of minor procedures were not published. Those procedures would require a lot of tinkering to ensure good convergence, stability, and accuracy. This type of tinkering requires months of learning combined with analysis. There has never been a ready out-of-the box version of Algorithm SM-5.

The source code of Algorithm SM-5 has never been published or opened, and the original algorithm could only be tested by users of SuperMemo 5, in MS DOS. SuperMemo 5 became freeware in 1993. It is important to notice that random dispersal of intervals around the optimum value was essential for establishing convergence. Without the dispersal, the progression of the algorithm would be agonizingly slow. Similarly, matrix smoothing was necessary for consistent behavior independent of the scarcity of data collected for different levels of stability and item complexity.

Multiple evaluations done in 1989, and since, have pointed to an unquestionable superiority of Algorithm SM-5 and later algorithms over Algorithm SM-2 in any metric studied. Algorithm SM-17 might actually be used to measure the efficiency of Algorithm SM-5 if we had volunteers to re-implement that ancient code for use with our universal metric. We have done comparisons for Algorithm SM-2 thus far. The implementation cost was simply insignificant. Needless to say, Algorithm SM-2 lags well behind in its predictive powers, esp. for suboptimum levels of retrievability (i.e. situations in which the user does not religiously stick to the prescribed schedule).

Even a basic understanding of the underlying model should make it clear that a good implementation would yield dramatic benefits. SuperMemo 5 would adapt its function of intervals to the user's memory. SuperMemo 2 was set in stone. I am very proud that wild guesses made in 1985 and 1987 stood the test of time, but no algorithm should trust the judgment of a humble student with 2 years experience in the implementation of spaced repetition algorithms. Instead, SuperMemo 4 and all successive implementations made fewer and fewer guesses, and provided better and faster adaptability. Of all those implementations, only SuperMemo 4 was slow to adapt and was replaced in 7 months by a superior solution.

There is no ill-will in Anki criticism but I would not be surprised if the author pushed for an implementation and speedy move to self-learning rather than spending time on tinkering with procedures that did not seem to work as he expected. In contrast, back in 1989, I knew Algorithm SM-2 was flawed, I knew Algorithm SM-5 was superior, and I would spare no time and effort in making sure the new concept was perfected to its maximum theoretical potential.

Excerpt from the Anki manual (April 2018):

Anki was originally based on the SuperMemo SM5 algorithm. However, Anki’s default behaviour of revealing the next interval before answering a card revealed some fundamental problems with the SM5 algorithm. The key difference between SM2 and later revisions of the algorithm is this:

  • SM2 uses your performance on a card to determine the next time to schedule that card
  • SM3+ use your performance on a card to determine the next time to schedule that card, and similar cards

The latter approach promises to choose more accurate intervals by factoring in not just a single card’s performance, but the performance as a group. If you are very consistent in your studies and all cards are of a very similar difficulty, this approach can work quite well. However, once inconsistencies are introduced into the equation (cards of varying difficulty, not studying at the same time every day), SM3+ is more prone to incorrect guesses at the next interval - resulting in cards being scheduled too often or too far in the future.

Furthermore, as SM3+ dynamically adjusts the "optimum factors" table, a situation can often arise where answering "hard" on a card can result in a longer interval than answering "easy" would give. The next times are hidden from you in SuperMemo so the user is never aware of this.

After evaluating the alternatives, the Anki author decided that near-optimum intervals yielded by an SM2 derivative are better than trying to obtain optimum intervals at the risk of incorrect guesses. An SM2 approach is predictable and intuitive to end users, whereas an SM3+ approach hides the details from the user and requires users to trust the system (even when the system may make mistakes in the scheduling)

Of all the above claims, only one is true. SuperMemo 2 is indeed more intuitive. This problem has plagued SuperMemo for years. Each version is more complex, and it is hard to hide some of that complexity from users. We will keep trying though.

This is why Anki criticism is not justified:

  • the fact that SuperMemo 5 used past performance of all items to maximize performance on new items is an advantage, not a "problem". Even more, that is the key to the power of adaptability
  • inconsistent grading has been a problem for all algorithms. On average, adaptability helps find out the average effect of mis-grading, esp. if inconsistencies are themselves consistent (i.e. the user keeps committing similar offences in similar circumstances)
  • mixed difficulties are handled by SuperMemo 5 much better. While both difficulty and stability increase are coded by E-factor in SuperMemo 2, in SuperMemo 5 and later, those two properties of memory are separated
  • interval predictions have been proven superior in SuperMemo 5, and the claim "more prone to incorrect guesses" can only be explained by errors in implementation
  • lower grades could bring longer intervals if matrix smoothing is not implemented. That part of the algorithm has only been described verbally in my thesis
  • intervals and repetition dates have always been prominently displayed in SuperMemo, even in simpler implementations (e.g. for handheld devices, smartphones, etc.). Nothing is hidden from the user. Most of all, forgetting index and burden statistics are in full view, and make it possible to see if SuperMemo keeps its retention promise, and at what cost to workload
  • the algorithm and all its details were in full display in SuperMemo 5. In particular, changes to the OF matrix would show at each repetition. For anyone with a basic understanding of the algorithm, its operations were fully transparent. Comprehensive tracking could also be implemented in Anki, or others applications, but might require lots of tinkering. It would be costly. In the end, Anki's choice might indeed be good on account of simplicity (in terms of implementation, operations, tracking, debugging, maintenance, user's intuitions, etc.)
  • the complaint of "hiding details" is the polar opposite to the claim that neural networks are superior for spacing repetitions. Unlike neural networks implementation, the operations of Algorithm SM-5 are relatively easy to analyze

Our official response published at supermemopedia.com in 2011 seems pretty accurate today:

Archive warning: Why use literal archives?
It is great that Anki introduces its own innovations while still giving due credit to SuperMemo. It is true that SuperMemo's Algorithm SM-2 works great as compared with, for example, Leitner system, or SuperMemo on paper. However, the superiority of Algorithm SM-5 over SM-2 is unquestionable. Both in practice and in theory. It is Algorithm SM-2 that has the intervals hard-wired and dependent only on item's difficulty, which is approximated with a heuristic formula (i.e. a formula based on a guess derived from limited pre-1987 experience). It is true that you cannot "spoil" Algorithm SM-2 by feeding it with false data. It is so only because it is non-adaptable. You probably prefer your word processor with customizable fonts even though you can mess up the text by applying Wingdings.

Algorithm SM-2 simply crudely multiplies intervals by a so-called E-Factor, which is a way of expressing item difficulty. In contrast, Algorithm SM-5 collects data about user's performance and modifies the function of optimum intervals accordingly. In other words, it adapts to the student's performance. Algorithm SM-6 goes even further and modifies the function of optimum intervals so that to achieve a desired level of knowledge retention. The superiority of those newer algorithms has been verified in more ways than one, for example, by measuring the decline in workload over time in fixed-size databases. In cases studied (small sample), the decline of workload with newer algorithms was nearly twice as fast as compared with older databases processed with Algorithm SM-2 (same type of material: English vocabulary).

All SuperMemo algorithms group items into difficulty categories. Algorithm SM-2 gives each category a rigid set of intervals. Algorithm SM-5 gives each category the same set of intervals too, however, these are adapted on the basis of user's performance, i.e. not set in stone.

"Bad grading" is indeed more important in Algorithm SM-5 than it is in Algorithm SM-2 as false data will result in "false adaptation". However, it is always bad to give untrue/cheat grades in SuperMemo, whichever algorithm you use.

With incomplete knowledge of memory, adaptability is always superior to rigid models. This is why it is still better to adapt to an imprecise average (as in Algorithm SM-5) than to base the intervals on an imprecise guess (as in Algorithm SM-2). Needless to say, the last word goes to Algorithm SM-8 and later, as it adapts to the measured average

Evaluation of SuperMemo 5 (1989)

SuperMemo 5 was so obviously superior that I did not collect much data to prove my point. I made only a few comparisons for my Master's Thesis and they left no doubt.

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

3.8. Evaluation of the Algorithm SM-5

The Algorithm SM-5 has been in use since October 17, 1989, and has surpassed all expectations in providing an efficient method of determining the desired function of optimal intervals, and in consequence, improving the acquisition rate (15,000 items learnt within 9 months). Fig. 3.5 indicates that the acquisition rate was at least twice as high as that indicated by combined application of the SM-2 and SM-4 algorithms!

Changes of the work burden in databases supervised by SM-2 and SM-5 algorithms

Figure: Changes of the work burden in databases supervised by SM-2 and SM-5 algorithms.

The knowledge retention increased to about 96% for 10-month-old databases. Below, some knowledge retention data in selected databases are listed to show the comparison between the SM-2 and SM-5 algorithms:

  • Date - date of the measurement,
  • Database - name of the database; ALL means all databases averaged
  • Interval - average current interval used by items in the database
  • Retention - knowledge retention in the database
  • Version - version of the algorithm applied to the database
Date Database Interval Retention Version
Dec 88 EVD 17 days 81% SM-2
Dec 89 EVG 19 days 82% SM-5
Dec 88 EVC 41 days 95% SM-2
Dec 89 EVF 47 days 95% SM-5
Dec 88 all 86 days 89% SM-2
Dec 89 all 190 days 92% SM-2, SM-4 and SM-5

In the process of repetition the following distribution of grades was recorded:

Quality Fraction
0 0%
1 0%
2 11%
3 18%
4 26%
5 45%

This distribution, in accordance to the assumptions underlying the Algorithm SM-5, yields the average response quality equal to 4. The forgetting index equals to 11% (items with quality lower than 3 are regarded as forgotten). Note, that the retention data indicate that only 4% of items in a database are not remembered. Therefore forgetting index exceeds the percentage of forgotten items 2.7 times.

In a 7-month old database, it was found that 70% of items had not been forgotten even once in the course of repetitions preceding the measurement, while only 2% of items had been forgotten more than 3 times

Theoretic proof of superiority of newer algorithms

Anki's criticism of SuperMemo 5 calls for a simple proof in the light of modern spaced repetition theory. We can show that today's model of memory can be mapped onto the models underlying both algorithms: Algorithm SM-2 and Algorithm SM-5, and the key difference between the two is the missing adaptability of the function of optimal intervals (represented in Algorithm SM-5 by matrix of optimum factors).

Let SInc=f(C,S,R) be a stability increase function that takes complexity C, stability S, and retrievability R as arguments. This function determines the progressive increase in review intervals in optimum learning.

Both Algorithms, SM-2 and SM-5 ignore the retrievability dimension. In theory, if both algorithms worked perfectly, we could assume they aim at R=0.9. As it can be measured in SuperMemo, both algorithms fail at that effort for they do not know relevant forgetting curves. They simply do not collect forgetting curve data. This data-collecting possibility was introduced only in Algorithm SM-6 in 1991.

However, if we assume that 1985 and 1987 heuristics were perfect guesses, in theory, the algorithm could use SInc=F(C,S) with constant R of 90%.

Due to the fact that SM-2 uses the same number, EF for stability increase and for item complexity, for SM-2 we have SInc=f(C,S) equation represented by EF=f'(EF,interval), where it can be shown easily with data that f<>f'. Amazingly, the heuristic used in SM-2 made this function work by decoupling the actual link between the EF and item complexity. As data shows that SInc keeps decreasing with an increase in S, in Algorithm SM-2, by definition, all items would need to gain on complexity with each review if EF was to represent item complexity. In practical terms, Algorithm SM-2 uses EF=f'(EF,interval), which translates to SInc(n)=f(SInc(n-1),interval).

Let us assume that the EF=f(EF,interval) heuristic was an excellent guess as claimed by proponents of Algorithm SM-2. Let SInc be represented by O-factor in Algorithm SM-5. We might then represent SInc=f(C,S) as OF=f(EF,interval).

For Algorithm SM-2, OF would be constant and equal to EF, in Algorithm SM-5, OF is adaptable and can be modified depending on algorithm's performance. It seems pretty obvious that penalizing the algorithm for bad performance by a drop to OF matrix entries, and rewarding it by an increase in OF entries is superior to keeping OF constant.

On a funny twist, as much as supporters of Algorithm SM-2 claim it performs great, supporters of neural network SuperMemo kept accusing algebraic algorithms of: lack of adaptability. In reality, the adaptability of Algorithm SM-17 is best to-date as it is based on the most accurate model of memory.

It is conceivable that heuristics used in SM-2 were so accurate that the original guess on OF=f(EF,interval) needed no modification. However, as it has been shown in practical application, the OF matrix quickly evolves, and converges onto values described in this publication (Wozniak, Gorzelanczyk 1994). They differ substantively from the assumption wired into Algorithm SM-2.

Summary:

  • sm17 (2016): SInc=f(C,S,R), 3 variables, f is adaptable
  • sm5 (1989): SInc=f(C,S,0.9), 2 variables, f is adaptable
  • sm2 (1987): SInc=f(SInc,S,0.9) - 1 variable, f is fixed

Convergence

Algorithm SM-5 showed fast convergence, which was quickly demonstrated by users who began with univalent OF matrices. It was quite a contrast with Algorithm SM-4.

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

3.6. Improving the predetermined matrix of optimal factors

The optimization procedures applied in transformations of the OF matrix appeared to be satisfactorily efficient resulting in fast convergence of the OF entries to their final values.

However, in the period considered (Oct 17, 1989 - May 23, 1990) only those optimal factors which were characterized by short modification-verification cycles (less than 3-4 months) seem to have reached their equilibrial values.

It will take a few further years before more sound conclusions can be drawn regarding the ultimate shape of the OF matrix. The most interesting fact apparent after analyzing 7-month-old OF matrices is that the first inter-repetition interval should be as long as 5 days for E-Factor equal 2.5 and even 8 days for E-Factor equal 1.3! For the second interval the corresponding values were about 3 and 2 weeks respectively.

The newly-obtained function of optimal intervals could be formulated as follows:

I(1)=8-(EF-1.3)/(2.5-1.3)*3

I(2)=13+(EF-1.3)/(2.5-1.3)*8

for i>2 I(i)=I(i-1)*(EF-0.1)

where:

  • I(i) - interval after the i-th repetition (in days)
  • EF - E-Factor of the considered item.

To accelerate the optimization process, this new function should be used to determine the initial state of the OF matrix (Step 3 of the SM-5 algorithm). Except for the first interval, this new function does not differ significantly from the one employed in Algorithms SM-0 through SM-5. One could attribute this fact to inefficiencies of the optimization procedures which, after all, are prejudiced by the fact of applying a predetermined OF matrix. To make sure that it is not the fact, I asked three of my colleagues to use experimental versions of the SuperMemo 5.3 in which univalent OF matrices were used (all entries equal to 1.5 in two experiments and 2.0 in the remaining experiment).

Although the experimental databases have been in use for only 2-4 months, the OF matrices seem to slowly converge to the form obtained with the use of the predetermined OF matrix. However, the predetermined OF matrices inherit the artificial correlation between E-Factors and the values of OF entries in the relevant E-Factor category (i.e. for n>3 the value OF(n,EF) is close to EF). This phenomenon does not appear in univalent matrices which tend to adjust the OF matrices more closely to requirements posed by such arbitrarily chosen elements of the algorithm as initial value of E-Factors (always 2.5), function modifying E-Factors after repetitions etc.

Matrix smoothing

Some of the criticism from the author of Anki might have been a result of not employing matrix smoothing that was an important component of the algorithm and is still used in Algorithm SM-17.

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

3.7. Propagation of changes across the matrix of optimal factors

Having noticed the earlier mentioned regularities in relationships between entries of the OF matrix I decided to accelerate the optimization process by propagation of modifications across the matrix. If an optimal factor increases or decreases then we could conclude that the OF factor that corresponds to the higher repetition number should also increase.

This follows from the relationship OF(i,EF)=OF(i+1,EF), which is roughly valid for all E-Factors and i>2. Similarly, we can consider desirable changes of factors if we remember that for i>2 we have OF(i,EF')=OF(i,EF)*EF'/EF (esp. if EF' and EF are close enough). I used the propagation of changes only across the OF matrix that had not yet been modified by repetition feed-back. This proved particularly successful in case of univalent OF matrices applied in the experimental versions of SuperMemo mentioned in the previous paragraph.

The proposed propagation scheme can be summarized as this:

  1. After executing Step 7 of the Algorithm SM-5 locate all neighboring entries of the OF matrix that has not yet been modified in the course of repetitions, i.e. entries that did not enter the modification-verification cycle. Neighboring entries are understood here as those that correspond to the repetition number +/- 1 and the E-Factor category +/- 1 (i.e. E-Factor +/- 0.1)
  2. Modify the neighboring entries whenever one of the following relations does not hold:
    • for i>2 OF(i,EF)=OF(i+1,EF) for all EFs
    • for i>2 OF(i,EF')=OF(i,EF)*EF'/EF
    • for i=1 OF(i,EF')=OF(i,EF)
    The selected relation should hold as the result of the modification
  3. For all the entries modified in Step 2 repeat the whole procedure locating their yet unmodified neighbors.

Propagation of changes seems to be inevitable if one remembers that the function of optimal intervals depends on such parameters as:

  • student's capacity
  • student's self-assessment habits (the response quality is given according to the student's subjective opinion)
  • character of the memorized knowledge etc.
Therefore it is impossible to provide an ideal, predetermined OF matrix that would dispense with the use of the modification-verification process and, to a lesser degree, propagation schemes.

Random dispersal of intervals

One of the key improvements in Algorithm SM-5 was the random dispersal of intervals. On one hand, it dramatically accelerated the optimization process, on the other, it caused a great deal of confusion in users of nearly all future versions of SuperMemo: "why do the same items with the same grade use a different interval at each try?". Minor deviations are precious. This was laid bare when "naked" Algorithm SM-17 was released in early SuperMemo 17. It could be seen that users who keep a lot of leeches in their collections would easily hit "local minima" from which they could never get out. The random dispersion was restored with some delay. The period of "nakedness" was needed for accurate observations of the algorithm, esp. in multi-decade learning process like my own.

Archive warning: Why use literal archives?
This text was part of: "Optimization of learning" by Piotr Wozniak (1990)

3.5. Random dispersal of optimal intervals

To improve the optimization process further, a mechanism was introduced that may seem to contradict the principle of optimal repetition spacing. Let us reconsider a significant flaw of the Algorithm SM-5: A modification of an optimal factor can be verified for its correctness only after the following conditions are met:

This means that even a great number of instances used in modification of an optimal factor will not change it significantly until the newly calculated value is used in determination of new intervals and verified after their elapse.

The process of verification of modified optimal factors after the period necessary to apply them in repetitions will later be called the modification-verification cycle. The greater the repetition number the longer the modification-verification cycle and the greater the slow-down in the optimization process.

To illustrate the problem of modification constraint let us consider calculations from Fig. 3.4.

One can easily conclude that for the variable INTERVAL_USED greater than 20 the value of MOD5 will be equal 1.05 if the QUALITY equals 5. As the QUALITY=5, the MODIFIER will equal MOD5, i.e. 1.05. Hence the newly proposed value of the optimal factor (NEW_OF) can only be 5% greater than the previous one (NEW_OF:=USED_OF*MODIFIER). Therefore the modified optimal factor will never reach beyond the 5% limit unless the USED_OF increases, which is equivalent to applying the modified optimal factor in calculation of inter-repetition intervals.

Bearing these facts in mind I decided to let inter-repetition intervals differ from the optimal intervals in certain cases to circumvent the constraint imposed by the modification-verification cycle.

The process of random modification of optimal intervals will be called "dispersal".

If a little fraction of intervals is allowed to be shorter or longer than it should follow from the OF matrix then these deviant intervals can accelerate the changes of optimal factors by letting them drop or increase beyond the limits of the mechanism presented in Fig. 3.4. In other words, when the value of an optimal factor is much different from the desired one then its accidental change caused by deviant intervals shall not be leveled by the stream of standard repetitions because the response qualities will rather promote the change than act against it.

Another advantage of using intervals distributed around the optimal value is the elimination of a problem which often was a matter of complaints voiced by SuperMemo users: the clustering of the repetition schedule. By the clustering of repetition schedule I mean accumulation of repetitory work in certain days while neighboring days remain relatively unburdened. This is caused by the fact that students often memorize a great number of items in a single session and these items tend to stick together in the following months being separated only on the base of their E-Factors.

Dispersal of intervals round the optimal ones eliminates the problem of clustering. Let us now consider formulas that were applied by the latest SuperMemo software in dispersal of intervals in proximity of the optimal value. Inter-repetition intervals that are slightly different from those which are considered optimal (according to the OF matrix) will be called near-optimal intervals. The near-optimal intervals will be calculated according to the following formula:

NOI=PI+(OI-PI)*(1+m)

where:

  • NOI - near-optimal interval
  • PI - previous interval used
  • OI - optimal interval calculated from the OF matrix (cf. Algorithm SM-5)
  • m - a number belonging to the range <-0.5,0.5> (see below)

or using the OF value:

NOI=PI*(1+(OF-1)*(1+m))

The modifier m will determine the degree of deviation from the optimal interval (maximum deviation for m=-0.5 or m=0.5 values and no deviation at all for m=0).

In order to find a compromise between accelerated optimization and elimination of clustering on one hand (both require strongly dispersed repetition spacing) and the high retention on the other (strict application of optimal intervals required) the modifier m should have a near-zero value in most cases.

The following formulas were used to determine the distribution function of the modifier m:

  • the probability of choosing a modifier in the range <0,0.5> should equal 0.5:
    integral from 0 to 0.5 of f(x)dx=0.5
  • the probability of choosing a modifier m=0 was assumed to be hundred times greater than the probability of choosing m=0.5:
    f(0)/f(0.5)=100
  • the probability density function was assumed to have a negative exponential form with parameters a and b to be found on the base of the two previous equations:
    f=a*exp(-b*x)

The above formulas yield values a=0.04652 and b=0.09210 for m expressed in percent.

From the distribution function

integral from -m to m of a*exp(-b*abs(x))dx = P (P denotes probability)

we can obtain the value of the modifier m (for m>=0):

m=-ln(1-b/a*P)/b

Thus the final procedure to calculate the near-optimal interval looks like this:

a:=0.047;

b:=0.092;

p:=random-0.5;

m:=-1/b*ln(1-b/a*abs(p));

m:=m*sgn(p);

NOI:=PI*(1+(OF-1)*(100+m)/100);

where:

  • random - function yielding values from the range <0,1) with a uniform distribution of probability
  • NOI - near-optimal interval
  • PI - previously used interval
  • OF - pertinent entry of the OF matrix