Algorithm SM-17

From supermemo.guru
Jump to navigation Jump to search

This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)

Introduction

Algorithm SM-17 is a modern version of the spaced repetition algorithm used in SuperMemo. It was developed, implemented, and tested in the years 2014-2016.

SuperMemo had a fertile impact on the research in modeling long-term memory. One of the most interesting fruits of that research is the two component model of memory. Over the last two decades, it has been hinted on many occasions that the model may provide a sound theoretical basis for a better and more universal approach to spaced repetition. Algorithm SM-17 is a successful conversion of the theory into a practical application. Due to its universal nature, the algorithm should, in the future, find its way to all SuperMemo products.

Two component model

The two component model of long-term memory underlies Algorithm SM-17. The model asserts that two variables are sufficient to describe the status of unitary memory in a healthy human brain:

  • stability - this variable determines how long a memory can last if undisturbed and if not retrieved
  • retrievability - this variable determines the probability of retrieving a memory at any given time since the last review/recall

In the literature on memory, an ill-defined term of memory strength will often stand for either stability (S) or retrievability (R). This results in a great deal of confusion that slows down the progress of memory research. In 1995 (and earlier), we have proven theoretically that the two variables: R and S are sufficient to describe the status of memory in spaced repetition.

Algorithm SM-17 is the ultimate practical proof for the correctness of the two component model.

A graph of actual changes in the value of the two components of memory provides a conceptual visualization of the evolving memory status:

SuperMemo: Changes in two variables of long-term memory over time
SuperMemo: Changes in two variables of long-term memory over time

Figure: Changes in memory status over time for an exemplary piece of knowledge. The horizontal axis represents time spanning the entire repetition history. The top panel shows retrievability (tenth power, R10, for easier analysis). Retrievability grid in gray is labelled by R=99%, R=98%, etc. The middle panel displays optimum intervals in navy. Repetition dates are marked by blue vertical lines and labelled in aqua. The end of the optimum interval where R crosses 90% line is marked by red vertical lines (only if intervals are longer than optimum intervals). The bottom panel visualizes stability (presented as ln(S)/ln(days) for easier analysis). The graph shows that retrievability drops fast (exponentially) after early repetitions when stability is low, however, it only drops from 100% to 94% in long 10 years after the 7th review. All values are derived from an actual repetition history and the three component model of memory.

Past vs. Future

All SuperMemo algorithms before the year 2015 took one essential assumption that limited their universal applicability in learning: the review of the learning material should occur at optimum timing determined by SuperMemo. Algorithms employed in SuperMemo 11 through 16 included weak heuristics that would allow of repetition at any time without a noticeable detriment to learning. In particular, those heuristics would help overcome the impact of the spacing effect where frequent reviews lead to less impact or even weakening of memories. New research indicates that the applicability of those heuristics was limited to the less extreme departures from the optimum schedule. The new algorithm, in theory, should make it possible to optimize review for any repetition history, and any repetition timing. It should also make it possible to simulate the status of memory produced with different variants of spaced repetition algorithms (e.g. those used by SuperMemo in remote past).

Weaknesses of Algorithm SM-15

Here is the list of major weaknesses of Algorithm SM-15 as revealed by comparisons with Algorithm SM-17:

  • lack of the retrievability dimension in mapping the function of optimum intervals
  • weak spacing effect heuristic (e.g. performing poorly in highly massed repetitions)
  • memory-less difficulty determination method (new A-Factor would be derived from the current estimated forgetting index and the O-Factor matrix without taking into account past difficulty estimates and repetition history)
  • incremental changes to item difficulty would produce nice-looking difficulty distributions that would not truly reflect actual item difficulty
  • heterogeneous streams of items passed via the optimization matrices resulted in clusters of "polluted" entries
  • relying on the best-fit approximation of optimum factors would not reflect variations caused by user strategy, or item heterogeneity
  • lack of filtering for the power law in heterogeneous forgetting curves. In particular, the first review forgetting curve is better approximated with a power function, which shows only in collections with substantial first review delay (beyond 100-200 days)
  • excess reliance on the weak correlation between grades and the estimated forgetting index
  • weak first grade vs. A-Factor correlation. For a flat correlation only extreme values of A-Factor would be used after the first repetition. Such a flat correlation would often develop in well-managed old collections
  • relying on the forgetting index as the main optimization criterion may be unreliable for heterogeneous learning material where difficulty streams are not well separated
  • lack of tools for maximizing the speed of learning for varying levels of the forgetting index
  • lack of tools for maximizing the increase in memory stability for varying levels of the forgetting index
  • first post-lapse interval would not differentiate between items whose forgetting was caused by different degrees of delay (rather than overestimated difficulty)
  • assuming optimality of the schedule: manual interventions into the length of intervals result in discarding the stability record, i.e. well-remembered items with short intervals are treated in the same way as other items with the same interval length and with the same difficulty.

See also:

Strengths of Algorithm SM-17

  • extending review optimization to various levels of recall (from cramming to decade-long delays)
  • optimum interval is always based on the full history of repetitions
  • data-based improvements to the algorithm can instantly be applied in the learning process by re-computing optimization matrices at 500,000 reps/min (previously, it would take months and years to collect corrective data)
  • best fit difficulty determination
  • universal memory formulas make it possible to simulate any imaginable learning outcomes, e.g. computing the efficiency of older algorithm post factum
  • universal memory formulas make it possible to simulate review scenarios for individual items, e.g. computing maximum learning speed schedules, or best average retention schedules, or probability of recall at chosen interval, etc.
  • separating the first review forgetting curve and the first post-lapse review forgetting curves from the forgetting curve matrix for the sake of keeping homogeneous flow of items when computing changes in recall and memory stability
  • employing power law for the first review forgetting curve as more accurate for long-interval review. Shorter first interval should be well compensated by faster increase in successive intervals for well-formulated items
  • less sensitivity to chaotic oscillation in memory stability caused by inaccurate stability or retrievability estimates. Those oscillations are not visible in older algorithms as stability estimates were not available to juxtapose against the interval used in the learning process
  • expressing first post-lapse interval as a function of retrievability (in addition to the number of lapses) yields better Recall-Retrievability matching later in the learning process
  • expressing first post-lapse interval as a function of retrievability makes it easier to estimate the difficulty of forgotten items by filtering out lapses caused by delays
  • 35-quantile forgetting curve for the first review spans a decade (rather than just 3 weeks as in older versions of the algorithm)

Weaknesses of Algorithm SM-17

  • unlike all prior versions of the algorithm, Algorithm SM-17 requires a full record of repetition history for all items. This adds an additional burden on developers and licensees
  • one-to-one mapping between recall and retrievability remains elusive. This is the main validity test for the underlying theory of memory. There is still room for improvement, esp. for difficult items
  • older versions of the algorithm would keep 255 most recent repetition outcomes when plotting multidimensional forgetting curves. Algorithm SM-17 keeps the full record of repetitions. This may make it more "conservative" in adapting to user's own progress (e.g. in formulating items). However, users of auto-memorized Advanced English 2014 showed dramatically bad recall predicted by the first forgetting curve in SuperMemo 16. This would be less prominent in Algorithm SM-17. Amending the inflexibility of Algorithm SM-17 is still in consideration (e.g. by recency weighing)

Theory

Two component memory model

The story and the implications of the two component model of memory are described in this article.

Due to the fact that real-life application of SuperMemo require tackling learning material of varying difficulty, the third variable involved in the model is item difficulty (D). Some of the implication of item difficulty have also be discussed in the above article. In particular, the impact of composite memories with subcomponents of different memory stability (S). The three component model of memory is referred to as the DSR model.

For the purpose of the new algorithm we have defined the three components of memory as follows:

Historical note: three variables of memory

The three variables of memory were introduced gradually into SuperMemo algorithms:

  • Difficulty (D) was first introduced in SuperMemo 1.0 (1987) in the form of E-Factors that reflected difficulty. E-Factors would decrease each time items performed poorly in learning
  • Stability (S) was first hinted at in SuperMemo 5.0 (1989), which was able to demonstrate the decline in stability increase with longer intervals (decrease in O-Factors with the increase in repetition number)
  • Retrievability (R) was first accounted for in SuperMemo 11 (2002), which used a heuristic formula to compensate for spacing effect in massed presentation and in delayed review

All three variables can now be computed precisely using repetition histories in Algorithm SM-17.

Deviations

To assess how well computer models fit reality, SuperMemo uses its own deviation measures. In particular:

  1. the deviation between grades and predicted grades derived from retrievability when computing the SInc[] matrix, and
  2. the deviation between matrices and theoretical symbolic function (in particular, the entries of the SInc[] matrix and the theoretically predicted values of the stability increase function Sinc()).

Grade deviation

SuperMemo converts grades to average retrievability correlated with those grades for a given user (every user has his own system of grading and different levels of average retrievability will correspond with individual grades). The grade-derived deviation (gDev) is the difference between grade-derived retrievability (Rg) and theoretical retrievability (Rt) derived from stability estimate (Sw). The fail-success deviation (fDev) is 1-R for grades>2 and R for grades<3.

In estimating the deviation of grades from predicted grades, SuperMemo uses the formula:

Dev=BGW*fDev+(1-BGW)*abs(gDev)

where:

  • Dev - prediction deviation
  • BGW - binary vs. grade-based weight (currently chosen to be 70%)
  • fDev - binary success-failure deviation
  • gDev - grade derived retrievability deviation

The deviation for an item repetition history is computed as the square root of the average of squared deviations (Dev) for each repetition.

The deviation for a collection is computed as the square root of the average of squared deviations (Dev) for each repetition in each repetition history of each item.

In least squares comparison, for items or for collections, the least squares deviation is:

LSDev=sqrt(sum(square(Dev))/count)

where:

  • LSDev - least squares deviation for all grade predictions (Dev) in a subset of repetition histories
  • count - number of repetitions in the set
  • sqrt - square root

The assumed weighing (BGW) is a heuristic based on experiments and the fact that grade-retrievability correlation makes for a weak prediction of grades based on R. The said deviation formula is subject to change.

The deviation used in estimating difficulty is weighed further to increase the impact of more recent repetitions on the current difficulty estimate.

Currently, for large collections, the algorithm can generate stability increase matrix which produces a deviation of 18%. Considering the stochastic nature of forgetting, this number would be close to the lowest possible theoretical value if only fail-success deviation (fDev) was used. However, grade-derived deviation (gDev) reduces the span of the deviation (by making R predictions more precise).

Theoretical minimum to a grade deviation

When using predicted grade deviations in assessing the strength of particular algorithmic solutions, we need to keep in mind that there is a theoretical limit on the minimum value of the deviation. This means that we will never approach the ideal 0% deviation.

For the formulas used above, the limit will depend on weights (e.g. BGW) and particular user's grade-retrievability averages.

If only binary success-failure deviation (fDev) was used, and all items were scheduled with a perfect interval at the requested forgetting index of 10%, the minimum deviation would correspond with the retrievability prediction of 0.9. That deviation would then be:

sqrt((R-Failure)*(R-Failure)*P(failure)+(R-Success)*(R-Success)*P(success))

which is:

sqrt((0.9-0)*(0.9-0)*0.1+(0.9-1)*(0.9-1)*0.9)=30%

In addition to the least square deviation, SuperMemo 17 will provide a few other metrics for assessing the effectiveness of the algorithm. If you want to find your metric in SuperMemo options, please join the discussion: Spaced repetition algorithm metric

Matrix deviation

SuperMemo uses matrix representations to express various functions, and hill-climbing approximations to find best theoretical symbolic matches for those functions for use in various contexts.

To guide the hill-climbing algorithms, a separate deviation measure is used as an optimization criterion. For each individual matrix entry:

Dev=sqr(abs(MatrixValue-ApproximationValue)/MatrixValue))

The deviation for the entire matrix is the square root of the average of individual deviations (Dev). SuperMemo expresses that deviation in percent.

Approximations

SuperMemo uses hill-climbing algorithms to match hypothetical functions that might model data categories of interest for the algorithm. Those functions can later be used to speed up calculations of Algorithm SM-17. Currently, three best fit approximation procedures are used to verify theoretical formulas used in the algorithm. Those approximations determine the following functions:

  • stability increase (arguments: D, S, R)
  • recall (arguments: D, S, R)
  • first post-lapse interval (arguments: Lapses, R)

Those approximation procedures may also be used in research that helps validate the algorithm. For example, to prove the exponential nature of forgetting, take additive and multiplicative variants of approximation formulas for Recall[D,S,R] with separate set of parameters defining power and exponential components along the R dimension. Outcome: despite the fact that heterogeneous forgetting curves may appear to follow the power law, Algorithm SM-17 is filtering items for difficulty well enough to eliminate the power component entirely from the equation. Note that the forgetting curve describing the first review does not form a part of the Recall[] matrix. This is because a heterogeneous stream of new items might pollute the matrix and initiate the "ripple effect" of distorted estimates for higher stability levels (as in older SuperMemos). That first review forgetting curve follows the power law pretty well and, unlike in older SuperMemos, power regression is used to produce the default startup stability for new items.

Approximation parameters in student assessment

There are several hypothetical correlations between approximation parameters and characteristics of user's learning process and knowledge quality. For example:

  • RMax in stability increase may be a good overall indicator of the quality of learning (the higher the RMax, the closer the approach to the optimum characterized by minimum information principle)(RMax is retrievability that maximized the increase in stability)
  • S Decay in stability increase may be a good indicated of the quality of knowledge (the worse the formulation, the higher S Decay)(S Decay is the decay constant determining how fast SInc[] decreases in the S dimension)
  • Recall Intercept in recall approximation is higher in well formulated collections (Recall Intercept is a measure of the agreement between Recall[] matrix and the theoretical exponential decline in recall with time)

The Algorithm

Intro

Algorithm SM-17 computes new intervals using the stability increase function. This function tells SuperMemo how much intervals should increase for all possible combinations of memory status variables: Difficulty, Stability and Retrievability. SuperMemo determines item difficulty by finding which difficulty level provides the best fit to grades scored in repetition history. The first interval is computed in a way that is similar to earlier SuperMemos. The stability increase function is computed using data collected during repetitions.

Stabilization

Central to Algorithm SM-17 is the stability increase function (stabilization) represented by the stability increase matrix (abbreviated to SInc[]). The simple formula at the core of SuperMemo is:

Int[n]=Int[n-1]*SInc[D,S,R]

This means that optimum intervals in SuperMemo increase by SInc[] in each repetition (assuming the forgetting index is 10%). D, S, R stand for 3 memory variables: difficulty, stability and retrievability respectively.

Stability increase function
Stability increase function

Figure: Stability increase function as computed by Algorithm SM-17. The function takes three arguments: (1) stability at review expressed in days (on the left), (2) retrievability at review (on the right), and (3) memory complexity expressed as item difficulty (slider labelled Diff currently set at 0.8). In the picture, stability increase peaks at around 15 (vertical axis). For some levels of stability and retrievability, the function drops below 1.0 indicating a decrease in stability (e.g. due to premature review evoking the spacing effect in massed presentation). 61,768 repetitions of relatively difficult items were needed to produce this graph (at Diff=0.8). The longest intervals reach around 14 years (stability quantile 5172)

Startup stability and startup interval

Before the formula for the increase in the length of intervals can be used, we need to determine the initial startup stability and the first interval to use in repetitions:

The relevant formulas used to determine initial intervals in SuperMemo are:

Int[1]=SI (for the first repetition)

Int[1]=PLS (for the first review after a failing grade)

Int[2]=S0*SInc[D,S,R]

Historical note: adding retrievability dimension

The SInc[] matrix is similar to the optimum factor matrix from earlier SuperMemos (OF matrix in SuperMemo 5.0 through SuperMemo 16.0), but it takes different arguments. In the same way as OF-matrix determined the increase of intervals, SInc[] is used in the exactly same way except it takes three arguments: Difficulty, Stability and Retrievability. OF-matrix was indexed by A-Factors (reflecting Difficulty) and repetition category (roughly related to Stability). However, OF matrix did not include the Retrievability dimension as repetitions were supposed to be executed at optimum intervals determined by the requested forgetting index.

Item difficulty

Difficulty: Definition

Item difficulty in SuperMemo Algorithm SM-17 is a number from 0 (easiest) to 1 (hardest). Items are linearly mapped in the 0..1 difficulty range by using their maximum possible memory stability increase. Maximum increase in stability occurs in the first review when the length of the interval corresponds with retrievability that produces the greatest increase in the strength of memory. Items with theoretically highest possible stability increase are assigned difficulty of 0. These are the easiest items in the process. Those items are the target of all forms of learning as they are a perfect reflection of the minimum information principle. As the maximum stability increase for very difficult items may approach 1.0 (or even drop below that value, i.e. produce memory lability!), all items at certain lowest acceptable maximum stability increase are classified as having difficulty 1.0. The exact cut off number is the parameter of the algorithm that is subject to change and further optimization (see Value N in Computing difficulty algorithm below). See also: Item difficulty in Algorithm SM-18

Historical note: measuring difficulty

As of Algorithm SM-8, SuperMemo used the concept of A-Factor to define item difficulty. A-Factor was defined as the increase in optimum interval during the first repetition (RepNo=2 in SuperMemo). In most situations, A-Factor would fall in the 1.0-10.0 range. SuperMemo employed the range from 1.2 to 6.9 where the 1.2 limit was set for the sake of filtering out the so-called "leech material", i.e. material whose difficulty would disqualify it from efficient use of spaced repetition. Algorithm SM-17 might have used that A-Factor definition or strive at a more universal concept of maximum increase in memory stability. Those two approaches do not need to differ much as newer data indicate that the maximum stability increase occurs at retrievability (R) that are close to the default levels used in SuperMemo. Algorithm SM-17 normalizes difficulty in the 0 (easiest) to 1 (hardest) range to roughly correspond with A-Factors 1.1 to 10.0. The assumption is that items that are harder than A-Factor=1.1 should be eliminated from the learning process, while items that are easier than A-Factor=10.00 are either extremely rare or can safely be treated along the rest of 10.0 items with a negligible impact on total workload.

Computing difficulty

Algorithm SM-17 requires a full history of repetitions to compute item difficulty (see below for a simplified procedure that goes around that requirement for developers who might wish to develop light SuperMemos that might look at smaller data sets for mobile devices, etc.).

For the full range of possible item difficulties, Algorithm SM-17 computes the deviations of predicted grades (based on a given assumed difficulty) from the actual grades as recorded in repetition history of a given item. The algorithm chooses the difficulty that produces the least deviation.

Computing difficulty: Algorithm outline

Computing difficulty:

  1. Set S to the default optimum first interval from the first forgetting curve for mixed difficulty material
  2. For all difficulty quantiles, using Algorithm SM-17, compute the deviation of predicted grades based on that difficulty, S0, and the actual repetition history from the grades scored in that repetition history
  3. Find maximum and minimum quantiles with lowest deviation (very often, several quantiles will produce the exactly same deviation as a real number retrievability is converted to grades, which are integers)
  4. Assume that the item in question has a difficulty that corresponds with the average of minimum and maximum quantiles that generate the minimum deviation

Note that weighing quantiles by deviations tends to amass most difficulties around 0.5, which shows that this is not a valid approach.

Computing the deviation:

  1. set S to S0
  2. for each repetition, using the interval, compute new S using SInc[] matrix indexed by the assumed difficulty, S, and R (corresponding with the interval)
  3. for each repetition, using S and interval, compute R and convert it to a predicted grade (using R:grade mapping characteristic for the user)
  4. compare that grade to the actual grade and add the difference to the deviation

Some important considerations in difficulty calculations:

  1. recent repetitions have a greater weight than older repetitions due to the fact that difficulty is subject to change over time (e.g. due to new learning or minor item reformulation)
  2. difficulty can be computed using theoretical SInc() function, or actual SInc[] matrix data. The outcomes will differ. In particular, the range of minimum and maximum quantiles with the minimum deviation may be larger when using real/irregular data. The superiority of one of the approaches is hard to establish

The solution adopted in Algorithm SM-17 was to use BestSInc() function that transitions from the theoretic SInc() towards data-based SInc[] with the gradual inflow of new data.

Theoretical SInc() function is the root where the baseline mapping of maximum stability increase to difficulty is established:

SInc:=(5*(1-Difficulty)+1)*f(R,S)+1

where:

  • SInc - stability increase
  • Difficulty - difficulty in 0..1 range
  • f(R,S) - component of SInc that depends on R and S

f(R,S) can be derived by curve fitting from a data-rich SInc[D,S,R] by ignoring the difficulty dimension. It should then be the best approximation of SInc[S,R]. In that sense, SInc[] is circular. Its first approximation is derived from data, the new value is determined to obtain a better approximation, etc. This exercise is necessary for good interval approximations before any data is obtained in a new collection.

This baseline formula currently determines the cutoff value for hardest possible items at SInc that is N times less than the maximum value possible for easiest items (with N=(6*f(R,S)+1)/(f(R,S)+1)).

Computing difficulty: Simplified procedure

We have also developed a simplified solution for computing item difficulties. This might be useful for developers who look for simplicity, thinner and faster code. The simplified procedure is similar to the approach used in older SuperMemos, esp. SuperMemo 8 through SuperMemo 16.

Item difficulty may be made history-less by relying on the fact that difficulty of an item keeps evolving depending on the context of knowledge and through interference. For that reason, the performance in the most recent repetition counts more than performance in prior repetitions.

Simplified difficulty can then be obtained using the formula:

D[n]=sDF(D[n-1],Grade,Rr)

where:

  • sDF - simplified difficulty function
  • D[n] - difficulty estimate after the n-th repetition
  • Grade - grade scored in the n-th repetition
  • Rr - expected recall at the time of the n-th repetition (based on the Recall[D,S,R] matrix)

Unlike best-fit difficulty, simplified difficulty requires no expensive hill-climbing optimization that may slow down SuperMemo (e.g. when computing learning data for long repetition histories). The accuracy of the simplified difficulty estimation is bound to be less and is currently being estimated.

A similar simplified approach was also adopted in SuperMemo 18. The new approach may be as effective or superior to the approach used in Algorithm SM-17. Its implementation is greatly simplified, and the cost of computing is dramatically less (no hill climbing). See: Item difficulty in Algorithm SM-18

Startup interval

The first interval called startup interval (Int[1] or SI) is determined using the first forgetting curve. Unlike the entries of the Recall[] matrix, the first forgetting curve data is collected using 35 quantiles determined by the interval distribution on a large representative sample collected from an incremental reading collection. Those quantiles span nearly a decade (older SuperMemos collected data for only the first 20 days after memorizing the item). The startup interval (the first optimum interval) is determined by power regression on that forgetting curve data. Power law forgetting is the result of the superposition of exponential forgetting curves of heterogeneous indeterminate difficulty material entering the learning process. The first interval is corrected for the requested forgetting index (rFI) and may be subject to random dispersion that is a standard fixture of SuperMemo algorithms since 1989.

In short, the startup interval is determined by the point in time at which the forgetting curve indicates the recall is likely to fall below the value determined by the requested forgetting index.

Note that the startup interval (with rFI=10%) is not the same as the startup stability. The interval needs to be determined before the first review (i.e. be a resultant of prior repetitions of other items). The startup stability is determined from repetition histories and may differ after each successive repetition. In short, it represents the best fit to data.

Forgetting curve collected with SuperMemo 17
Forgetting curve collected with SuperMemo 17

Figure: The first forgetting curve for newly learned knowledge collected with SuperMemo. Power approximation is used in this case due to the heterogeneity of the learning material freshly introduced in the learning process. Lack of separation by memory complexity results in superposition of exponential forgetting with different decay constants. On a semi-log graph, the power regression curve is logarithmic (in yellow), and appearing almost straight. The curve shows that in the presented case recall drops merely to 58% in four years, which can be explained by a high reuse of memorized knowledge in real life. The first optimum interval for review at retrievability of 90% is 3.96 days. The forgetting curve can be described with the formula R=0.9906*power(interval,-0.07), where 0.9906 is the recall after one day, while -0.07 is the decay constant. In this is case, the formula yields 90% recall after 4 days. 80,399 repetition cases were used to plot the presented graph. Steeper drop in recall will occur if the material contains a higher proportion of difficult knowledge (esp. poorly formulated knowledge), or in new students with lesser mnemonic skills. Curve irregularity at intervals 15-20 comes from a smaller sample of repetitions (later interval categories on a log scale encompass a wider range of intervals)

Startup stability

Startup stability S[0] is the stability accomplished with the first single review. Unlike stability computed at later repetitions, this stability cannot be derived from the SInc[] matrix. Instead, it is computed in a way similar to the way difficulty is computed. Within a range of possible values, S[0] is used to compute memory stabilities after successive repetitions. The startup stability S[0] is taken as the value that produces the least predicted grade deviation from grades in repetition history.

For histories with perfect recall, infinite stabilities minimize the grade deviation. This is why startup stability needs to be limited to the maximum allowed startup stability (S0Max). For all repetition histories in a large dataset, we have plotted the impact of S0Max on the overall grade deviation. It seems that for most users, the deviation does not improve beyond S0Max of 3 months. There are always items that will never fail (e.g. due to a real-life repetition), but very few memorized items can survive more than 3 months without the first review.

Reducing S0Max further (e.g. to 1 month), results in SInc[] matrix compensation, which tends to yield similar stabilities after a couple of repetitions. However, lower S0Max results in a higher grade deviation.

Post-lapse stability

Post-lapse stability (PLS) is the stability after a review with a failing grade. Unlike stability computed after successful repetitions, this stability cannot be derived from the SInc[] matrix.

In the ideal case, for simple memories, forgetting results in a reset of stability back to near-zero. In theory, only difficult items made of composite memories may show a substantial decrease in the costs of re-learning, however, even that does not show in data (perhaps due to imperfect difficulty estimations or due to SuperMemo's natural tendency to help users eliminate hard-to-learn material in the learning process).

It has been shown long ago that the length of the first post-lapse optimum interval is best correlated with the number of memory lapses afforded an item. Even then, post-lapse interval usually oscillates in the 1-4 days range for forgetting index of 10%, and the correlation is not very useful in adding to the efficiency of learning in optimizing the length of that interval. Some competitive spaced repetition software, as well as SuperMemo in its first years, experimented with re-learning hypotheses based on ancient wisdom of learning psychology, e.g. by halving intervals after a memory lapse. Current data shows clearly that this is a harmful and time-wasting approach.

In addition to memory lapses, there is also a strong correlation between the length of the post-lapse interval and retrievability (R). This should be obvious considering that substantial delay in review will result in "undeserved" lapse on items that might be then quite easy to re-learn. Algorithm SM-17 takes retrievability dimension into account.

In the light of the above, SuperMemo uses a separate matrix for post-lapse stabilities: PLS[] with Lapse and Retrievability dimensions. The first interval after scoring a failing grade is then determined as follows:

Int[1]:=PLS[Lapses,R]

where:

Retrievability

Theoretical retrievability

In theory, retrievability R should be easy to derive from stability and interval:

R[n]:=exp(-k*t/S[n-1])

where:

That neat theoretical approach is made a bit more complex when we consider that forgetting may not be perfectly exponential if items are difficult or with mixed difficulty. In addition, forgetting curves in SuperMemo can be marred by user strategies in all situations where grades are not an ideal reflection of recall/retrievability.

Three measures of retrievability

Theoretical two-component model may be undermined by imperfect difficulty filtering, user strategies, implementation shortcomings (e.g. poor choice of parameters), and imperfect modelling of reality. For that reason SuperMemo uses three different measures of retrievability:

  • Theoretical retrievability (R) as derived from the exponential decline of recall over time (as above). This measure is a perfect reflection of passing time
  • Recall matrix: measured recall R for different levels of item difficulty, memory stability, and theoretical R (or time). This measure is a perfect reflection of average recall (for imperfectly measured memory status parameters)
  • Recall-Grade correlation: all users have their own average level of recall corresponding with all five grades used in SuperMemo. This measure is highly unreliable (for example, there were cases in which higher grades have been shown to be associated with lower recall)

Recall matrix: Algorithm SM-17 accounts for possible non-exponential decay of memory, as well as for D and S estimate errors, by using the Recall[D,S,R] matrix. Recall matrix takes the same arguments as the SInc[] matrix. Recall[D,S,R] is the average recall for items of difficulty D, at stability level S, and at retrievability R. Recall matrix keeps the forgetting curve record with the time axis expressed as retrievability. The main functions of Recall[] matrix are:

  1. to measure the actual level of recall for different retrievability levels and
  2. verify the soundness of the model and its parameters (in the ideal theoretical case Recall should equal retrievability).

The actual recall level statistic is used to amend the theoretical retrievability level derived from stability and time.

SuperMemo 17: The Recall[] matrix graph showing that actual recall differs from predicted retrievability
SuperMemo 17: The Recall[] matrix graph showing that actual recall differs from predicted retrievability

Figure: The Recall[] matrix graph shows that the actual recall differs from predicted retrievability. For higher stabilities and difficulties, it is harder to reach the desired recall level.

Recall-Grade correlation: For individual repetitions, retrievability can be checked against the average recall level registered for individual grades at repetitions. Success and failure of memory, as defined in the grade scale are also used to measure binary recall (0 - failure, 1 - success). Algorithm SM-17 uses both the recall-grade correlation and binary recall to add to the set of information needed to estimate memory stability. Grades carry more information, while binary recall is a better reflection of forgetting. Those two must be weighed up when looking at grade-derived estimate of R.

Retrievability estimate

All the three measures of retrievability can be weighed up to provide for the best estimate of R that is used in further estimates of stability and stability increase. Weights used in the estimate will change with inflow of new information (e.g. repetition cases registered in the Recall matrix). The ultimate goal of weight parameters is to minimize the deviation between the model and the grades scored in learning.

A careful distinction must be made between all measures of recall. Theoretical retrievability (R) concept is essential for indexing optimization matrices as R is the ultimate target when converging onto the theoretical model of memory. At the other extreme, stability estimates are item-specific and need to take into account past recall statistics and actual grades scored by a particular item in the learning process.

In Algorithm SM-18, the role of grade-recall correlations was diminished and predicted retrievability is largely determined by the theoretical value (when data is unavailable), and the value derived from the Recall matrix (when data keeps arriving). For a mature collection, the value of predicted retrievability approaches the value derived form the Recall[] matrix. In fringe cases, the predicted value will differ substantially from theoretical retrievability. For example, when leeches drive down recall even after a day's long interval, the prediction may stand at R=76%, while the theoretical value is R=99%. In contrast, for material with excellent recall, predicted R may be much higher than the theoretical R computed for very long intervals. This high estimate for high quality material is the main reason for positive measures of R-metric in SuperMemo 18.

Stability

The interval and the resulting grade can be used to draw conclusions about the stability estimate. If the interval is greater than stability, good grades indicate stability underestimate. Conversely, short intervals and poor grades may indicate an overestimate. This type of reasoning underlay the first attempt to compute the stability increase function in this 2005 article. However, due to being applicable to a subset of repetition scenarios, this information is insufficient to estimate stability for all items and all repetitions. In addition, conclusions drawn in 2005 differ substantially from the new data based on far more accurate recall-verified retrievability estimate (as described above).

The second and more reliable source of information about stability is the value derived from retrievability estimate modified along prior recall measurements. Stability can be estimated from retrievability level and the stability estimate value before the repetition. This is the most reliable estimate.

Last but not least, all repetitions produce a new value of estimated stability. That pre-repetition estimate is the third source of information on the actual level of stability.

Those three sources of information are weighed up for reliability (e.g. on the number of cases in the Recall matrix used to modify R, or the number of cases in the SInc[] matrix used to generate pre-repetition stability). The actual parameters used in weighing up information have been computed using optimization methods targeted at minimizing predicted grade deviation in larger collections.

Stability increase

Stability increase function is used to compute intervals in Algorithm SM-17.

The stability increase function is expressed in Algorithm SM-17 using three different methods:

  • the theoretical formula, which is a multiplicative compound expression of the impact of all its three D, S, R arguments
  • the matrix SInc[] built on the basis of data collected from collection's repetition histories
  • the best compromise function (BestSInc) that combines the information from the matrix, with support of neighboring matrix entries where data is missing, and with support of the theoretical function where no data is available

Due to its monotonicity, theoretical formula is used in all hill-climbing procedures which are employed richly in the algorithm. It is also used exclusively in new collections where no learning data is available. The SInc[] matrix is the truest expression of stability changes in user's particular collection. The SInc[] matrix has the greatest research value and is used in all contexts where true properties of memory need to be taken into account. The matrix is the key to Alg-SM17 adaptability. Finally, the ultimate compromise BestSInc() function is a statement on what is best known about the memory properties and learning strategies of the user and his or her particular collection at any given moment in time.

The choice between the three expressions of SInc[], and the choice of compromise parameters in BestSInc() are essential for the ultimate efficiency of the algorithm. Those choices and parameters are still in the process of being optimized.

The SInc[] matrix can be computed using three different methods:

  • in case of lack of data, from a theoretical universal formula, for all new users
  • incrementally during learning while collecting repetition data and executing Algorithm SM-17
  • wholesale by recomputing stability increase noted in all past repetitions (esp. for users of earlier versions of SuperMemo)

The recommended approach for users of older SuperMemo is to compute a new SInc[] matrix as the first step and then proceed with incremental changes during a course of standard learning with Algorithm SM-17.

Interval

Intervals in Algorithm SM-17 are computed using stability increase function SInc() (the best compromise variant is used as described above as BestSInc).

The following formula is used:

Int[n]:=S[n-1]*SInc[D[n-1],S[n-1],R[n]]*ln(1-rFI/100)/ln(0.9)

where:

This formula is analogous to similar formulas in all versions of SuperMemo beginning with Algorithm SM-0. The role of the stability increase SInc[] was played earlier by E-Factors (based on item difficulty) and O-Factors (derived from forgetting curves).

Data

  • item repetition histories (date and grade)
  • recall matrix: Recall[D,S,R]
  • stability increase matrix: SInc[D,S,R] (i.e. the matrix representation of the stability increase function)

The Algorithm: Outline

The following procedure can be used to determine the status of memory (DSR status) at any moment of time for a given history of repetitions. Once the DSR status is known, we can determine the next interval using criteria of choice, e.g. forgetting index, maximization of stability, long-term workload minimization, etc.

  1. estimate item difficulty D using the history repetitions for that item
  2. determine startup stability S0 using the history of repetitions
  3. for all repetition history records repeat the steps below
  4. compute theoretical retrievability Rt using current stability estimate Sw and the interval Int
  5. update Recall[] matrix using D, Sw[n-1], Rt with the grade-derived recall
  6. compute recall-based retrievability Rr
  7. compute grade-derived retrievability Rg
  8. estimate weighted Rw from Rt, Rr, and Rg
  9. compute Rw-based stability Sr
  10. compute SInc-based Se (Se=Sw[n-1]*SInc[D,Sw[n-1],Rw])
  11. compute interval derived stability Si
  12. estimate weighted Sw from Sr, Se, and Si
  13. compute the stability increase SInc on the basis of Sw change
  14. update Sinc[] matrix using D, Sw, Rw with the new SInc value
  15. compute new interval using Int:=Sw*SInc[D,Sw,Rw]
  16. go back computing Rt step

Note that replacing Int:=Int[n-1]*SInc[D,Sw,Rw] with Int:=Sw*SInc[D,Sw,Rw] frees the user from ever worrying about manual intervention in the length of intervals (e.g. before an exam, or when encountering knowledge in real life, etc.). This is one of the key advantages of Algorithm SM-17 over all prior versions of the algorithm.

Metrics: First year of testing

SuperMemo: The ultimate performance metric for Algorithm SM-17 against the old Algorithm SM-15
SuperMemo: The ultimate performance metric for Algorithm SM-17 against the old Algorithm SM-15

Figure: R-Metric graph demonstrates superiority of Algorithm SM-17 over the old Algorithm SM-15 for the presented collection used in the testing period of 12 months before the release of SuperMemo 17. It was plotted using 10,699 data points (i.e. repetition cases with data from both algorithms), and smoothed up to show the trends. One spot beneath the line of 0 at the vertical axis (R-metric<0) corresponds with a moment in time when the previous version of the algorithm appeared superior (in this case, due to a small sample of repetitions). Some positive and negative trends correspond with changes in the algorithm as data were collected in the new algorithm's testing period. A gradual increase in the metric in the months Feb-May 2016, might be a statistical aberration, or it might be the result of new interval values and a bigger R-metric for intervals departing from the optimum used in earlier SuperMemos. The latter interpretation might suggest that the benefits of Algorithm SM-17 can gradually increase over time.

History of the algorithm

Although the presented algorithm may seem complex, you should find it easier to follow its steps once you understand the evolution of individual concepts such as E-Factor, matrix of optimum intervals, optimum factors, forgetting curves, or mid-interval repetition:

Here is a brief history of SuperMemo Algorithm from the birth of spaced repetition in 1985:

  • 1985 - Paper-and-pencil version of SuperMemo was formulated (see: Birth of spaced repetition). Repetitions of whole pages of material proceeded along a fixed table of intervals. See also: SuperMemo on paper
  • 1987 - First computer implementation of SuperMemo makes it possible to divide material into individual items. Items are classified into difficulty categories by means of E-Factors. Each difficulty category has its own optimum spacing of repetitions (see: SuperMemo 1.0 for DOS (1987))
  • 1989 - SuperMemo 4 was able to modify the function of optimum intervals depending on the student's performance (Algorithm SM-4). This was then the first algorithm in which the function of optimal intervals was adaptable
  • 1989 - SuperMemo 5 replaced the matrix of optimum intervals with the matrix of optimal factors (an optimum factor is the ratio between successive intervals). This approach accelerated the adaptation of the function of optimum intervals (Algorithm SM-5)
  • 1991 - SuperMemo 6 derived optimal factors from forgetting curves plotted for each entry of the matrix of optimum factors. This could dramatically speed up the convergence of the function of optimum intervals to its ultimate value (Algorithm SM-6). This was then the first adaptable algorithm that would use regression to find the best fit to the actual memory performance data. Unlike SuperMemo 5, which could keep converging and diverging depending on the quality of the learning material and the learning process, SuperMemo 6 would get closer to the student's ultimate memory model with each day of learning
  • 1995 - SuperMemo 8 capitalized on data collected by users of SuperMemo 6 and SuperMemo 7 and added a number of improvements that strengthened the theoretical validity of the function of optimum intervals and made it possible to accelerate its adaptation, esp. in the early stages of learning (Algorithm SM-8). New concepts:
    • replacing E-Factors with absolute difficulty factors: A-Factors. Item difficulty was thus defined in terms of actual properties of human memory, and would not depend on the average difficulty of the learning material
    • fast approximation of A-Factors from the First Grade vs. A-Factor correlation graph and Grade vs. Forgetting index graph. This makes it possible to quickly guess item's difficulty before more data is available
    • real-time adjustment of the matrix of optimal factors based on the power approximation of the decline of optimum factors
  • 2002 - SuperMemo 11 (aka SuperMemo 2002) introduced the first SuperMemo algorithm that is resistant to interference from delay or advancement of repetitions: Algorithm SM-11. This makes it possible to safely delay repetitions (Postpone) or advance repetitions (Review):
  • 2005 - SuperMemo 12 (aka SuperMemo 2004) introduced boundaries on A and B parameters computed from the Grade vs. Forgetting Index data. This plugs up a weakness in the algorithm that showed when importing repetitions from other applications (e.g. open source MemAid). If a large number of easy repetitions occurred at unnaturally long intervals (as after pre-training with another application), the graph might show reversed monotonicity that would temporarily affect the estimation of A-Factors (the speed of self-correction would be reversely proportional to the amount of flawed data). When boundaries are imposed, self-correction is instant, and the accuracy of A-Factor estimation increases with each repetition executed in SuperMemo
  • 2011 - Algorithm SM-15 in SuperMemo 15 eliminated two weaknesses of Algorithm SM-11 that would show up in heavily overloaded collections with very large item delays:
    • U-Factors now allow of correctly interpreting repetition delays of up to 15 years (previously only 60 days)
    • forgetting curves are now corrected for repetition delays beyond the maximum registered U-Factor value (preventing failed grades in delayed repetitions decreasing the estimates of the optimum interval for standardly-scheduled items in the same category)
  • 2015 - Algorithm SM-17 is the first algorithm based entirely on the two component model of memory. It is the largest qualitative change in history of spaced repetition
  • 2019 - Algorithm SM-18 improves the approximation of the stabilization function. It also changes the way item difficulty is computed

See also