Algorithm SM-15

From supermemo.guru
Jump to navigation Jump to search

This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)

Introducing Algorithm SM-15

Algorithm SM-15 is a spaced repetition algorithm used in SuperMemo 15 (2011) and SuperMemo 16 (2013). Algorithm SM-15 evolved over 18 years from Algorithm SM-8.

Algorithm SM-15 runs in parallel with Algorithm SM-17 in SuperMemo 17, and can be used to show the superiority of Algorithm SM-17

Below you will find a general outline of the 8th major formulation of the repetition spacing algorithm released in 2011. It is referred to as Algorithm SM-15 since it was first implemented in SuperMemo 15. Although the increase in complexity of Algorithm SM-15 as compared with its predecessors (e.g. Algorithm SM-6) is incomparably greater than the expected benefit for the user, there is a substantial theoretical and practical evidence that the increase in the speed of learning resulting from the upgrade may fall into the range from 30 to 50%. In addition, Algorithm SM-15 eliminates various problems with repetition delay and repetition advancement evident in Algorithm SM-8.

Algorithm SM-15

General idea

SuperMemo begins the effort to compute the optimum inter-repetition intervals by storing the recall record of individual items (i.e. grades scored in learning). This record is used to estimate the current strength of a given memory trace, and the difficulty of the underlying piece of knowledge (called item). The item difficulty expresses the complexity of memories, and reflects the effort needed to produce unambiguous and stable memory traces. SuperMemo takes the requested recall rate as the optimization criterion (e.g. 95%), and computes the intervals that satisfy this criterion. The function of optimum intervals is represented in a matrix form (OF matrix) and is subject to modification based on the results of the learning process. Although satisfying the optimization criterion is relatively easy, the complexity of the algorithm stems from the need to obtain maximum speed of convergence possible in the light of the known memory models.

Steps of Algorithm SM-15

Here are the main steps of Algorithm SM-15:

  1. Optimum interval: Inter-repetition intervals are computed using the following formula:

    I(1)=OF[1,L+1]
    I(n)=I(n-1)*OF[n,AF]

    where:

    • OF - matrix of optimal factors, which is modified in the course of repetitions
    • OF[1,L+1] - value of the OF matrix entry taken from the first row and the L+1 column
    • OF[n,AF] - value of the OF matrix entry that corresponds with the n-th repetition, and with item difficulty AF
    • L - number of times a given item has been forgotten (from "memory Lapses")
    • AF - number that reflects absolute difficulty of a given item (from "Absolute difficulty Factor")
    • I(n) - n-th inter-repetition interval for a given item
  2. Advanced repetitions: Because of possible advancement in executing repetitions (e.g. forced review before an exam), the actual optimum factor (OF) used to compute the optimum interval is decremented by dOF using formulas that account for the spacing effect in learning:

    dOF=dOFmax*a/(thalf+a)
    dOFmax=(OF-1)*(OI+thalf-1)/(OI-1)

    where:

    • dOF - decrement to OF resulting from the spacing effect
    • a - advancement of the repetition in days as compared with the optimum schedule (note that there is no change to OF if a=0, i.e. the repetition takes time at optimum time)
    • dOFmax - asymptotic limit on dOF for infinite a (note that for a=OI-1 the decrement will be OF-1 which corresponds to no increase in inter-repetition interval)
    • thalf - advancement at which there is half the expected increase to synaptic stability as a result of a repetition (presently this value corresponds roughly to 60% of the length of the optimum interval for well-structured material)
    • OF - optimum factor (i.e. OF[n,AF] for the n-th interval and a given value of AF)
    • OI - optimum interval (as derived from the OF matrix)
  3. Delayed repetitions: Because of possible delays in executing repetitions, the OF matrix is not actually indexed with repetitions but with repetition categories. For example if the 5-th repetition is delayed, OF matrix is used to compute the repetition category, i.e. the theoretical value of the repetition number that corresponds with the interval used before the repetition. The repetition category may, for example, assume the value 5.3 and we will arrive at I(5)=I(4)*OF[5.3,AF] where OF[5.3,AF] has a intermediate value derived from OF[5,AF] and OF[6,AF]
  4. Matrix of optimum intervals: SuperMemo does not store the matrix of optimum intervals as in some earlier versions. Instead it keeps a matrix of optimal factors that can be converted to the matrix of optimum intervals (as in the formula from Point 1). The matrix of optimal factors OF used in Point 1 has been derived from the mathematical model of forgetting and from similar matrices built on data collected in years of repetitions in collections created by a number of users. Its initial setting corresponds with values found for a less-than-average student. During repetitions, upon collecting more and more data about the student's memory, the matrix is gradually modified to make it approach closely the actual student's memory properties. After years of repetitions, new data can be fed back to generate more accurate initial OF matrix. In SuperMemo 18, this matrix can be viewed in 3D with Toolkit : Statistics : Analysis : Graphs 3-D Graphs : O-Factor Matrix
  5. Item difficulty: The absolute item difficulty factor (A-Factor), denoted AF in Point 1, expresses the difficulty of an item (the higher it is, the easier the item). It is worth noting that AF=OF[2,AF]. In other words, AF denotes the optimum interval increase factor after the second repetition. This is also equivalent with the highest interval increase factor for a given item. Unlike E-Factors in Algorithm SM-6 employed in SuperMemo 6 and SuperMemo 7, A-Factors express absolute item difficulty and do not depend on the difficulty of other items in the same collection of study material (see FAQs for an explanation)
  6. Deriving OF matrix from RF matrix: Optimum values of the entries of the OF matrix are derived through a sequence of approximation procedures from the RF matrix which is defined in the same way as the OF matrix (see Point 1), with the exception that its values are taken from the real learning process of the student for who the optimization is run. Initially, matrices OF and RF are identical; however, entries of the RF matrix are modified with each repetition, and a new value of the OF matrix is computed from the RF matrix by using approximation procedures. This effectively produces the OF matrix as a smoothed up form of the RF matrix. In simple terms, the RF matrix at any given moment corresponds to its best-fit value derived from the learning process; however, each entry is considered a best-fit entry on its own, i.e. in abstraction from the values of other RF entries. At the same time, the OF matrix is considered a best-fit as a whole. In other words, the RF matrix is computed entry by entry during repetitions, while the OF matrix is a smoothed copy of the RF matrix
  7. Forgetting curves: Individual entries of the RF matrix are computed from forgetting curves approximated for each entry individually. Each forgetting curve corresponds with a different value of the repetition number and a different value of A-Factor (or memory lapses in the case of the first repetition). The value of the RF matrix entry corresponds to the moment in time where the forgetting curve passes the knowledge retention point derived from the requested forgetting index. For example, for the first repetition of a new item, if the forgetting index equals 10%, and after four days the knowledge retention indicated by the forgetting curve drops below 90% value, the value of RF[1,1] is taken as four. This means that all items entering the learning process will be repeated after four days (assuming that the matrices OF and RF do not differ at the first row of the first column). This satisfies the main premise of SuperMemo, that the repetition should take place at the moment when the forgetting probability equals 100% minus the forgetting index stated as percentage. In SuperMemo 18, forgetting curves can be viewed with Toolkit : Statistics : Analysis : Forgetting Curves (or in 3-D with Toolkit : Statistics : Analysis : 3-D Curves):
    SuperMemo: Tools : Statistics : Analysis : Forgetting Curves graphs for 20 repetition number categories multiplied by 20 A-Factor categories

    Uniform forgetting curve for a single memory stability and item difficulty level

    SuperMemo: Exemplary 3-D graph of forgetting curves for A-Factor=3.6
    SuperMemo: Exemplary 3-D graph of forgetting curves for A-Factor=3.6

    3D representation of the family of forgetting curves for a single item difficulty and varying memory stability levels (normalized for U-Factor)

    SuperMemo: In SuperMemo 18, forgetting curves can be normalized over A-Factors at different repetition categories. As a result, you can display (1) your cumulative forgetting curve (blue dots) and (2) its negative exponential approximation used by SuperMemo (red line).
    SuperMemo: In SuperMemo 18, forgetting curves can be normalized over A-Factors at different repetition categories. As a result, you can display (1) your cumulative forgetting curve (blue dots) and (2) its negative exponential approximation used by SuperMemo (red line).

    A cumulative normalized forgetting curve for all registered repetition cases in a single collection

  8. Deriving OF matrix from the forgetting curves: The OF matrix is derived from the RF matrix by:
    1. fixed-point power approximation of the R-Factor decline along the RF matrix columns (the fixed point corresponds to second repetition at which the approximation curve passes through the A-Factor value),
    2. for all columns, computing D-Factor which expresses the decay constant of the power approximation,
    3. linear regression of D-Factor change across the RF matrix columns, and
    4. deriving the entire OF matrix from the slope and intercept of the straight line that makes up the best fit in the D-Factor graph. The exact formulas used in this final step go beyond the scope of this illustration.
    Note that the first row of the OF matrix is computed in a different way. It corresponds to the best-fit exponential curve obtained from the first row of the RF matrix.
    All the above steps are passed after each repetition. In other words, the theoretically optimum value of the OF matrix is updated as soon as new forgetting curve data is collected, i.e. at the moment, during the repetition, when the student, by providing a grade, states the correct recall or wrong recall (i.e. forgetting) (in Algorithm SM-6, a separate procedure Approximate had to be used to find the best-fit OF matrix, and the OF matrix used at repetitions might differ substantially from its best-fit value)
  9. Item difficulty: The initial value of A-Factor is derived from the first grade obtained by the item, and the correlation graph of the first grade and A-Factor (G-AF graph). This graph is updated after each repetition in which a new A-Factor value is estimated and correlated with the item's first grade. Subsequent approximations of the real A-Factor value are done after each repetition by using grades, OF matrix, and a correlation graph that shows the correspondence of the grade with the expected forgetting index (FI-G graph). The grade used to compute the initial A-Factor is normalized, i.e. adjusted for the difference between the actually used interval and the optimum interval for the forgetting index equal 10%
  10. Grades vs. expected forgetting index correlation: The FI-G graph is updated after each repetition by using the expected forgetting index and actual grade scores. The expected forgetting index can easily be derived from the interval used between repetitions and the optimum interval computed from the OF matrix. The higher the value of the expected forgetting index, the lower the grade. From the grade and the FI-G graph (see: FI-G graph in Tools : Statistics : Analysis), we can compute the estimated forgetting index which corresponds to the post-repetition estimation of the forgetting probability of the just-repeated item at the hypothetical pre-repetition stage. Because of the stochastic nature of forgetting and recall, the same item might or might not be recalled depending on the current overall cognitive status of the brain; even if the strength and retrievability of memories of all contributing synapses is/was identical! This way we can speak about the pre-repetition recall probability of an item that has just been recalled (or not). This probability is expressed by the estimated forgetting index
    SuperMemo: Tools : Statistics : Analysis : Graphs : Grade vs. Forgetting index graph
    SuperMemo: Tools : Statistics : Analysis : Graphs : Grade vs. Forgetting index graph
  11. Computing A-Factors: From (1) the estimated forgetting index, (2) length of the interval and (3) the OF matrix, we can easily compute the most accurate value of A-Factor. Note that A-Factor serves as an index to the OF matrix, while the estimated forgetting index allows one to find the column of the OF matrix for which the optimum interval corresponds with the actually used interval corrected for the deviation of the estimated forgetting index from the requested forgetting index. At each repetition, a weighted average is taken of the old A-Factor and the new estimated value of the A-Factor. The newly obtained A-Factor is used in indexing the OF matrix when computing the new optimum inter-repetition interval

Items vs. topics

Algorithm SM-15 is used only to compute the intervals between repetitions of items. Topics are reviewed at intervals computed with an entirely different algorithm (not described here). The timing of topic review is optimized with the view to managing the sequence of reading, and is not aimed at aiding memory. Long term memories are formed in SuperMemo primarily with the help of items, which are reviewed along the schedule computed by Algorithm SM-15.

Summary

Repetitions result in computing a set of parameters characterizing the memory of the student: RF matrix, G-AF graph, and FI-G graph. They are also used to compute A-Factors of individual items that characterize the difficulty of the learned material. The RF matrix is smoothed up to produce the OF matrix, which in turn is used in computing the optimum inter-repetition interval for items of different difficulty (A-Factor) and different number of repetitions (or memory lapses in the case of the first repetition). Initially, all student’s memory parameters are taken as for a less-than-average student (less-than average yields faster convergence than average or more-than-average), while all A-Factors are assumed to be equal (unknown).

Optimization solutions used in Algorithm SM-15 have been perfected over 24 years of using the SuperMemo method with computer-based algorithms (first implementation: December 1987). This makes sure that the convergence of the starting memory parameters with the actual parameters of the student proceeds in a very short time. In addition, Algorithm SM-15 includes mechanisms that make it insensitive to interference from the deviation from the optimum repetition timing (e.g. delayed or advanced repetitions). It can also accept uninitialized learning process imported from other applications (e.g. other SuperMemos). The introduction of A-Factors and the use of the G-AF graph greatly enhanced the speed of estimating item difficulty. The adopted solutions are the result of constant research into new algorithmic variants.

Algorithm SM-15 is constantly being perfected in successive releases of SuperMemo, esp. to account for newly collected repetition data, convergence data, input parameters, instabilities, etc. In particular, new procedures are being researched for capitalizing on sleep log data collected by SuperMemo. Making repetitions at optimum circadian time should allow of substantial increase in optimum intervals, and further acceleration of the learning process in those individuals who opt to log in their sleep data.

History of the algorithm

Although the presented algorithm may seem complex, you should find it easier to follow its steps once you understand the evolution of individual concepts such as E-Factor, matrix of optimum intervals, optimum factors, forgetting curves, or mid-interval repetition:

Here is a brief history of SuperMemo Algorithm from the birth of spaced repetition in 1985:

  • 1985 - Paper-and-pencil version of SuperMemo was formulated (see: Birth of spaced repetition). Repetitions of whole pages of material proceeded along a fixed table of intervals. See also: SuperMemo on paper
  • 1987 - First computer implementation of SuperMemo makes it possible to divide material into individual items. Items are classified into difficulty categories by means of E-Factors. Each difficulty category has its own optimum spacing of repetitions (see: SuperMemo 1.0 for DOS (1987))
  • 1989 - SuperMemo 4 was able to modify the function of optimum intervals depending on the student's performance (Algorithm SM-4). This was then the first algorithm in which the function of optimal intervals was adaptable
  • 1989 - SuperMemo 5 replaced the matrix of optimum intervals with the matrix of optimal factors (an optimum factor is the ratio between successive intervals). This approach accelerated the adaptation of the function of optimum intervals (Algorithm SM-5)
  • 1991 - SuperMemo 6 derived optimal factors from forgetting curves plotted for each entry of the matrix of optimum factors. This could dramatically speed up the convergence of the function of optimum intervals to its ultimate value (Algorithm SM-6). This was then the first adaptable algorithm that would use regression to find the best fit to the actual memory performance data. Unlike SuperMemo 5, which could keep converging and diverging depending on the quality of the learning material and the learning process, SuperMemo 6 would get closer to the student's ultimate memory model with each day of learning
  • 1995 - SuperMemo 8 capitalized on data collected by users of SuperMemo 6 and SuperMemo 7 and added a number of improvements that strengthened the theoretical validity of the function of optimum intervals and made it possible to accelerate its adaptation, esp. in the early stages of learning (Algorithm SM-8). New concepts:
    • replacing E-Factors with absolute difficulty factors: A-Factors. Item difficulty was thus defined in terms of actual properties of human memory, and would not depend on the average difficulty of the learning material
    • fast approximation of A-Factors from the First Grade vs. A-Factor correlation graph and Grade vs. Forgetting index graph. This makes it possible to quickly guess item's difficulty before more data is available
    • real-time adjustment of the matrix of optimal factors based on the power approximation of the decline of optimum factors
  • 2002 - SuperMemo 11 (aka SuperMemo 2002) introduced the first SuperMemo algorithm that is resistant to interference from delay or advancement of repetitions: Algorithm SM-11. This makes it possible to safely delay repetitions (Postpone) or advance repetitions (Review):
  • 2005 - SuperMemo 12 (aka SuperMemo 2004) introduced boundaries on A and B parameters computed from the Grade vs. Forgetting Index data. This plugs up a weakness in the algorithm that showed when importing repetitions from other applications (e.g. open source MemAid). If a large number of easy repetitions occurred at unnaturally long intervals (as after pre-training with another application), the graph might show reversed monotonicity that would temporarily affect the estimation of A-Factors (the speed of self-correction would be reversely proportional to the amount of flawed data). When boundaries are imposed, self-correction is instant, and the accuracy of A-Factor estimation increases with each repetition executed in SuperMemo
  • 2011 - Algorithm SM-15 in SuperMemo 15 eliminated two weaknesses of Algorithm SM-11 that would show up in heavily overloaded collections with very large item delays:
    • U-Factors now allow of correctly interpreting repetition delays of up to 15 years (previously only 60 days)
    • forgetting curves are now corrected for repetition delays beyond the maximum registered U-Factor value (preventing failed grades in delayed repetitions decreasing the estimates of the optimum interval for standardly-scheduled items in the same category)
  • 2015 - Algorithm SM-17 is the first algorithm based entirely on the two component model of memory. It is the largest qualitative change in history of spaced repetition
  • 2019 - Algorithm SM-18 improves the approximation of the stabilization function. It also changes the way item difficulty is computed

FAQ

See: SuperMemo Algorithm FAQ

This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)