History of the optimization of repetition spacing

From supermemo.guru
Jump to navigation Jump to search

This text is part of: "History of spaced repetition" by Piotr Wozniak (June 2018)

Problem with spaced repetition research

History of research on spaced repetition has been plagued by the following factors:

  • guesses and heuristics used in place of mathematical optimization
  • poor interaction between theory and practice with science focused on simple experiments and practice focused on simple tools
  • terminological inconsistency that leads to cycles of forgetting and re-discovery!

The above agrees with my ranking of factors of failure. Until the arrival of personal computing and the web, it was hard to escape the vicious cycle.

Spaced repetition intuitions

When we asked teenagers a set of questions about how their memory works, a large proportion could come with pretty good guesses about repetition spacing without ever making any measurements. In particular, they often correctly guess that the first optimum inter-repetition interval might be 1-7 days long and that successive intervals will increase. Moreover, many could guess that the second interval might be a month long and that successive intervals might double. In other words, spaced repetition is a common intuition.

Early memory research

In 1885, Hermann Ebbinghaus made a major contribution to the science of memory. He experimented on himself and came up with the first outline of the forgetting curve. He was also aware of the spacing effect. He never worked on spaced repetition. I do not credit Hermann for an inspiration in my work over spaced repetition as I simply had no idea who Hermann was and what he accomplished. I designed my own measurement that led to spaced repetition. In an unrelated and forgotten exercise, I also produced my own forgetting curve that might have influenced my thinking. Hermann's curve was much steeper and might have actually discouraged further work (see: Error of Ebbinghaus forgetting curve). Our Adam Mickiewicz University library was well stocked up with "ancient" pre-WW2 German literature, however, I knew no German. Mine was an ignorant solo effort. I read about Ebbinghaus later, and mentioned his forgetting curve in my Master's Thesis.

By 1901, in texts written by William James, the superiority of spaced review seemed clear and it seemed like a matter of time before it would permeate the learning theory with optimization of spacing taken as the next obvious step. It was not to be. For another 8 decades.

In his popular book of 1932, C.A. Mace has suggested a simple spaced repetition schedule: 1 day, 2 days, 4 days, 8 days, etc. Good guess! Mace's effort was forgotten though because spaced repetition on "postcards in a pocket" before the era of the Internet must have hardly been appealing. For a good start, Mace would have to shine with his own good example to encourage others. He described his excellent ideas about efficient learning as a theory. He never mentioned his own experience. Promoting a new idea in those days was probably not easy. Herr Hitler dominated the news. Perhaps the progress in spaced repetition theory was yet another victim of the Nazi?

1960s: The Renaissance

In 1966, Herbert Simon had a peek at Jost's Law derived around 1897 from Ebbinghaus's work. Simon noticed that exponential nature of forgetting necessitates the existence of a memory property that today we call memory stability. Simon wrote a short paper explaining his idea, and moved on to hundreds of other projects he had on his mind. His text was largely forgotten.

At roughly the same time, Robert Bjork had a great deal of innovative ideas in reference to learning and memory. As it often happens, he was ahead of his time. Teachers hardly ever listen to psychologists. Students do not even know their names. If Bjork was a programmer, we might have had the first popular application of spaced repetition a decade earlier. I think he would just not let a great idea off the hook. It was Bjork who seems to have been first to clearly separate retrieval strength and storage strength in a model analogous to our two component model of memory.

In 1967, Paul Pimsleur could clearly see that spaced repetition could be a great tool for learning word-pairs in language learning. Like SuperMemo, he struggled with terminology and used the term "graduated-interval recall". In our "serrated forgetting curve" challenge, Pimsleur came closest with the earliest known serrated curves graph as in the picture:

Pimsleur's serrated curves
Pimsleur's serrated curves

Perhaps we will discover earlier sketches of the idea, however, for technical reasons, the older the print, the less rich it is in graphs, which we today generate en masse in Excel.

Pimsleur's intervals extended into periods of hours, minutes and even seconds. It was a reflection of an intuition, not measurement. He extended his reasoning from declarative knowledge that can easily be measured (e.g. word pairs) to procedural knowledge and audio-pattern recognition, as in learning pronunciation. SuperMemo solves this problem by separating word-pair learning from pronunciation, spelling, recognition, synonyms, and the like. As a result, e.g. in Advanced English, we never need to reduce intervals beyond user's standard startup stability, which rarely drops down below a day. For practical reason and due to the role of sleep, SuperMemo never uses intervals shorter than 1 day. Sleep is also the main reason why the algorithm uses 1-day resolution in the length of intervals. SuperMemo makes it possible to review multiple times in a day, but this is part of a subset review that, on occasion, may appear useful (e.g. when cramming for an exam). Pimsleur's interval recommendations were different than those of Mace or SuperMemo on paper (Algorithm SM-0). They were not a result of a measurement, but a result of a speculation, which ranged from solid to poor. Pimsleur thought of ensuring recall of 60%, which is very low by SuperMemo standards. He bet on startup stability of 5 seconds, while SuperMemo uses 1-15 days, which is just fine for 90% recall of well-formulated knowledge. Pimsleur's base of interval exponentiation (E-Factor) was 5, which should be 1.4-2.5 in most cases. As a result, Pimsleur's spacing differs dramatically from SuperMemo's, and should not be used as a benchmark in algorithmic metric. In his original paper (1967), Pimsleur proposed intervals of 5 sec., 25 sec., 2 min., 10 min., 1 hour, 5 hours, 1 day, 5 days, 25 days, 4 months, and 2 years. The differences came mostly from the practise based on materials of different character (equivalent to high complexity in SuperMemo). The use of seconds, minutes and hours is tantamount to cramming and is strongly discouraged in SuperMemo. Instead, optimization of knowledge representation is advised.

In 1969, Alfred Maksymowicz wrote "Read and think". You will not find his book in your library. It was written in Polish and for a narrow circle of students of technical universities. It mentioned spaced repetition, forgetting curves, and even how the forgetting index might determine the optimum interval. Maksymowicz proposed the first optimum interval to be 3 days. As many efforts before and after, this good advice remained largely ignored. Students rush to pass an exam and then forget. Cram and dump is a principle by which the pressure of schooling destroys the prospects of good long-term learning. I know of Maksymowicz's book only because I studied at a technical university in Poland, and I was pretty loud of my own spaced repetition method. I can only imagine that there have been dozens other similar texts where intuitions were formulated as a good advice that then remained ignored by the masses. Without the coincidence of time and space, future texts on spaced repetition might never notice Maksymowicz ever existed. Maksymowicz might have been inspired by Pimsleur, Mace, his own intuition, or other potential texts of which I have no knowledge. Maksymowicz gives credence to the words of Szafraniec, skeptical of SuperMemo: "all has occurred before".

1972: Leitner box

The greatest practical and algorithmic success in the area of spaced review before SuperMemo can be attributed to Sebastian Leitner. In 1972, he proposed the Leitner box system. In a Leitner system, flashcards are prioritized and dumped to boxes corresponding with different stability levels. The Leitner system has one huge advantage over the theoretical advice dished prior to his proposition: it was practical. It was a system anyone could use with little introduction. Even SuperMemo on paper (1985) seems complex in comparison.

An alternative method of Leitner System where incorrect answers are only moved back by one box
An alternative method of Leitner System where incorrect answers are only moved back by one box

Figure: An incorrect mutation of the Leitner system where failed answers are moved back by one box only (source: Wikipedia). This variant was in use in Duolingo for a while

The Leitner box is not a spaced repetition tool. It is a prioritization tool. There is no concept of an interval, let alone optimum interval. The name box comes from the original implementation in the form of physical flashcard boxes with not association to passing time. When the Leitner box is used regularly on a small-sized collection of flashcards, it simulates the behavior of spaced repetition. If intervals are too short, it leads to cramming. If they get too long, it leads to sub-optimum outcomes. However, in SuperMemo, low priority material may also be postponed cyclically and yield very long intervals which reduce expected stability increase, but carry a larger stability increase for items that survive longer intervals. In the 1990s and early in the new millennium, the Leitner system was used in many successful flashcard applications. As they kept tinkering and improving the review procedures, these apps might have actually evolved into a full-blown spaced repetition system. Their application declined though due to the popularity of SuperMemo's Algorithm SM-2 that turned out to be easy to implement and vastly superior.

Newer software mutations of the Leitner box system may attach intervals to priority boxes, e.g. 16 days for Box #5, but this approach has flaws tantamount to cramming: (1) failure still leads to the regression of intervals, while it should lead to resumed learning, (2) five repetitions in the first month does not compare well to well-formulated knowledge that may reduce the cost of learning in SuperMemo in the first month alone by 60-80%, and (3) more boxes would be needed. We have seen intervals well beyond maximum human lifespan in SuperMemo. The needs for lifetime applications are 200 thousand percent higher. This is the difference between a permastore interval and 16 days. 11 extra boxes would be needed to cover the lifetime at E-factor of 2.

Today, one of the most popular systems for learning languages is Duolingo. For a long while, it used the Leitner system. Today they employ their own new algorithm based on retrievability predictions. However, they still used the Leitner system as a benchmark. To make matters worse, their benchmark used the reverse transfer of flashcard in priority boxes (where the post-lapse stability is overestimated). Normalized Leitner might be used as a benchmark, however, simple normalization equivalent to using E-factor of 2, may produce different results than the choice of E-factor 1.6. In the future, all algorithms should switch to a universal metric proposed by SuperMemo, and Algorithm SM-2 might become a useful metric benchmark that can be implemented in parallel with proprietary solutions. I hope users will demand clarity, statistics, metrics, and full openness in that respect. Incidentally, if you happen to use SuperMemo 17 version 17.4, you can compare Algorithm SM-17 with the Leitner system, Pimsleur and Algorithm SM-2. Needless to say, if your collection is sizeable enough, the differences are pretty stunning.

In the 1970s, Tony Buzan would focus on structured knowledge with his mind-mapping innovations. Mind maps and SuperMemo would, paradoxically, stand in conflict due to a lack of a good unifying theory. In short, we need good models to understand the world, and we need the spaced review to retain the components of the model in the long term. Buzan also had his own ideas how the review should be spaced. When he first met SuperMemo in the early 1990s, he instantly agreed with the concept, however, he always preferred to focus on knowledge structure rather than a mere review.

1980s: SuperMemo

My own work entered the picture in 1982 when I really got fed up with a never-ending process of forgetting. I wanted to learn biochemistry and physiology. I would read books, make notes and it would all be for nothing due to the process of forgetting. Even the most important facts could slip the memory at the most unfortunate moment (e.g. exam). I decided to employ active recall. Instead of just making notes, I would make notes as questions and answers. I could cover answers and respond using active recall. This would dramatically improve learning. This is how it is done in SuperMemo to this day. This new approach had a lovely impact on boosting my love of learning.

By 1984, I was fluent enough with my active recall approach to know that complex questions don't work. If you pack too much stuff into the answer, e.g. make a long list of it, you will keep forgetting. This would be futile learning. I later called that quest for simplicity "minimum information principle". Today, this principle is one of the first mentioned among 20 rules of knowledge formulation.

The real breakthrough came in 1985, i.e. exactly 100 years after the publication of Ebbinghaus' dissertation on memory. I wanted to check how the spacing of review affects recall. I needed to figure out the length of optimum intervals between repetitions. Obviously, those intervals exist. I only needed to measure them. The experiment is described here. The experiment was simple, rough, lazy, and hurried. Instead of taking a patient few years to find out all details, after 6 months I formulated the first SuperMemo algorithm. You can call it the first case of somewhat scientific spaced repetition. My research was based on one person, and one type of learning material, but it was universal enough to have many faithful users years later. On Jul 31, 1985, I started learning biochemistry using the new method. This is the birthday of computational spaced repetition. The computer program, SuperMemo for DOS came in 1987, and the name SuperMemo was proposed in 1988.

In the 1980s, Jaap Murre's Memory Chain Model was one of the early models of memory that might have led to a solid spaced repetition algorithm. It even had its own early application, Captain Mnemo, that might have competed with SuperMemo for priority in the field. Captain Mnemo and OptiLearn are examples of why, in academic environments, great theories are often not followed by practical implementations that could gain wider appeal.

In 1991, SuperMemo World was formed and its beginnings are described here. As of that point, expansion of spaced repetition has been exponential. By 1999, we started using the term "spaced repetition" instead of the "SuperMemo method". For recent developments at SuperMemo World see here.