In teaching, it is common for instructors to speak about the people whose work is being taught. Such information helps put the research into historical and cultural perspective. If the person whose work is described is someone with whom students can identify, the work can take on added personal significance. This observation sets the stage for the present article. The person who is the subject of this article—a certain E. R. F. W. Crossman—is someone whose work has been taught to legions of students, yet little is known about him or her.

The two authors of this article have long been interested in Crossman’s scientific discoveries, but neither of us had a picture of Crossman, literally or figuratively. The dearth of information about him or her was at odds with the investigator’s impact. We became curious to know who E. R. F. W. Crossman is or was, and decided to learn about him or her, both for the human-interest side of the story and to see whether this clearly brilliant scientist might have produced other treasures beyond the ones we knew about.

Because of the unusual background of this article, we want to place Crossman’s research in context by indicating where it fits in with other work to which it is related. We also want tell the story of how we finally managed to learn about Crossman. The outcome of our investigation was eye-opening in many ways.

The plan for the article is as follows. In the next section (the article’s second), we focus on an area where one of Crossman’s best-known projects was extremely influential—visually guided aiming. It happens that this domain was described by the first author in a lecture attended by the second author, and it was there that the two of us decided to find out who Crossman is or was.

In the third section of the article, we discuss the contribution for which Crossman is best known, the Power Law of Learning. Here we discuss Crossman’s famous study of cigar rolling, covered in virtually all cognitive psychology and human factors textbooks. It was the cigar-rolling data, along with other data concerning the time to complete tasks as a function of the amount of practice on the tasks, that led Crossman to endorse the Power Law of Learning. The formula has incited a great deal of discussion, including debate about whether it is the best quantitative account of data concerning practice-related speeding. Regardless of how that controversy plays out, the large amount of attention the formula has received points to the deep interest that Crossman’s theorizing has stirred.

In the article’s fourth section, we turn to the path we took to learn about Crossman. We found ourselves acting like detectives. We describe the steps we took and what we learned about Crossman. The experience was fraught in many ways. Pain as well as pleasure was uncovered in the process.

In the fifth and final section, we provide closing remarks about this remarkable investigator.

Aiming

In early 2018, the second author (hereafter, Markus) traveled from the University of Tübingen (Germany) to visit the first author (hereafter, David) at the University of California, Riverside (USA). During the visit, Markus asked David whether he could attend a lecture David would be giving that day. “Of course,” David answered, appreciating the opportunity to spend more time with Markus and wanting to get comments from Markus about the lecture itself.

The lecture that day happened to be about aiming. The topic was reaching for targets, with special reference to the processes of homing in on targets using one’s eyes and hands. David tried to make clear in the lecture that although reaching for objects may appear simple, it is actually quite complex, at least when studied in detail. Being able to reach for things in the external environment can be challenging, as when a toddler tries to reach for a cup of milk and the cup is likely to spill, or when a Parkinson’s patient tries to move a computer cursor and the cursor’s path is likely to waver.

As David told the class, several theoretical models have been developed to explain aiming, and as he also explained, a number of luminaries in psychological science have delved into this topic and have used it as a springboard to studying skill more broadly. You, the reader of this article, are invited to join the lecture now (recast a bit), to get the main ideas and set the stage for the presentation of Crossman’s work.

Woodworth

The name Robert Woodworth will be familiar to readers of this journal. Woodworth is best known for his book-length review of experimental psychology, the first such publication (Woodworth, 1938). Less well known may be the fact that Woodworth’s doctoral dissertation, which was published as a monograph (Woodworth, 1899), concerned the speed and accuracy of aiming movements.

For his dissertation, Woodworth asked participants to move a hand-held stylus back and forth between two targets at different rates, with and without visual feedback. On the basis of his results, Woodworth drew a distinction between behavior that is ballistic (unaffected by feedback) and behavior that is controlled (based on feedback). This distinction set the stage for the well-known contrast between automatic and controlled processes (e.g., Posner, 1973). For a review of Woodworth’s work on aiming and its sequels, see Elliott, Helsen, and Chua (2001). A useful review of additional research on aiming has been provided by Elliott et al. (2010).

Woodworth’s brilliance as an experimenter, coupled with his scholarship, led to his being called the “Dean of Experimental Psychology” (Hearst, 1979). A biography of Woodworth was prepared by Graham (1967).

Fitts

The next researcher whose work was covered in the lecture was Paul Fitts. Fitts is best known for his work on stimulus–response compatibility, the tendency for responses to be either easy (quick and accurate) or hard (slow and inaccurate), depending on the particular stimulus–response pairings that are tested (Fitts & Deininger, 1954). Stimulus–response compatibility has proven to be one of the most powerful tools for uncovering basic and applied principles of psychological science; for reviews, see Proctor and Vu (2006, 2016). Its applications have even gone so far as to expose implicit bias in thousands of people who have taken the implicit attitudes test (Banaji & Greenwald, 2016; Greenwald, McGhee, & Schwartz, 1998).

Fitts (1954) also studied aiming. In his first aiming experiment, he measured the time to move a hand-held stylus to targets of varying sizes at varying distances from a starting point. The movements were to be made as quickly as possible. Fitts (1954) found that the time to move from the home position to the target grew as the distance increased and as the size of the target decreased. He obtained similar results in further experiments with variants of this basic task.

The quantitative relation that Fitts (1954) proposed between his main independent variables (target distance and target size) and the dependent variable (movement time) was

$$ MT=a+b\bullet {\log}_2\left(\frac{2\bullet A}{W}\right), $$

where MT is the movement time, A is the amplitude (distance) of the needed movement (distance to the center of the target), W is the width of the target along the axis of movement or the diameter of the (circular) target to which aiming is required, and a and b are empirical constants.

Fitts’ formula has come to be called Fitts’ law. It is one of the few “laws” of psychology, that term having been applied to it because of the enormous range of conditions in which it holds. For reviews of Fitts’ law, see Elliott et al. (2010), Rosenbaum (2010), and Schmidt, Lee, Winstein, Wulf, and Zelaznik (2018). In 2004, a special issue of the International Journal of Human–Computer Studies was devoted to the impact of Fitts’ law on human–computer interaction research (Guiard & Beaudouin-Lafon, 2004).

Fitts, like Woodworth, extended his interest from aiming to skill learning more generally. In the final years of his all-too-brief life—he died in 1965, just four days before his 53rd birthday—he proposed a three-stage theory of skill learning. That theory remains one of the most influential theories in psychological science to this day. According to the theory (Fitts, 1964; Fitts & Posner, 1967), skill learning begins with a phase requiring conscious or deliberate control. After that comes a stage in which tuning occurs. Finally, there is a phase requiring little or no conscious or deliberate control.

Crossman and Goodeve

The facts just given about Fitts and Woodworth set the stage for the first work of Crossman’s to be covered in this article, and in the lecture at which the idea to investigate him was spawned. The work is cited in the literature as Crossman and Goodeve (1963/1983). Two dates are given, with a slash between them, because of the unusual nature of the work’s publication (which in turn gives a hint as to Crossman’s obscurity). The work was presented as a talk at the Oxford (England) meeting of the Experimental Psychology Society in July 1963. Yet it was not actually published until 1983, when, thanks to the efforts of Alan Wing (1983), the write-up of the talk, which had previously been passed around as a mimeograph (the precursor of a photocopied document), was finally put into print. Wing (1983, p. 245) wrote in his foreword to the published article, “this paper is probably the most often cited unpublished work in the current literature on human movement control.”

Returning to the lecture given by David and attended by Markus, it will help to note that, before turning to the work of Crossman and Goodeve (1963/1983), David said that when Fitts had offered his formula relating movement time to target distance and size, he did not attempt to explain the temporal dynamics that made the relation possible. Rather, David said, Fitts had couched his formula in terms of information theory.

Information theory was proposed by Shannon and Weaver (1949); it had a tremendous impact on psychological science and still permeates modern technology. The number of bytes on a computer hard drive is the number of eight-bit strings (digital values) that the computer can house. The notion that information can be represented in digital form (as 1s and 0s or “yes” and “no” values) is the heart of information theory. We live in the digital age, largely because of information theory.

The way that information theory was applied to psychological science can be seen in the Hick–Hyman law of choice reaction time, according to which choice reaction time increases linearly with the log2 of the number of stimulus–response alternatives (Hick, 1952; Hyman, 1953; see also Merkel, 1885). The reason Hick and Hyman investigated choice reactions and developed the model they did was because they surmised, on the basis of information theory, that participants would engage in a clear strategy when choosing responses based on stimuli. At least implicitly, participants would subdivide the set of stimuli and responses into successively smaller sets until, ultimately, just one response would be possible given the stimulus in that trial. This claim spawned a great deal of subsequent research, which has also qualified this interpretation (e.g., Rosenbaum, 2014, chap. 5). For example, while many researchers have treated the Hick–Hyman law as a perceptual phenomenon, others have favored an interpretation in terms of the time required to retrieve a particular stimulus–response assignment from memory (Jamieson & Mewhort, 2009).

As we just implied, information theory focuses on uncertainty and its resolution. According to information theory, the amount of information in a signal is, roughly, the number of “yes/no” answers required to pick that signal from the possible alternatives. The number of “yes/no” answers (the number of binary digits or “bits”) is the amount of information in the signal.

Seeing the rise of information theory in the late 1940s and early 1950s, Fitts hypothesized that aiming for a target could be viewed as picking out the target from the background it occupied. The farther away a target was, the more information it contained, and the smaller the target, the more information it also contained. On the basis of such considerations, Fitts suggested that the time to reach a target is proportional to its corresponding information content. Critically for the subsequent study by Crossman and Goodeve (1963/1983), Fitts left out speculation about the real-time process by which the target was identified (physically reached).

It was Crossman and Goodeve (1963/1983) who tackled this problem. These authors offered a simple model that did a good job of predicting Fitts’ law. According to the model, which is schematized in Fig. 1, someone performing an aiming task behaves in a systematic fashion. He or she moves his or her stylus a constant proportion, p, of the distance to the target, and keeps doing so until the target is reached. If p is .5, for example, the pen tip goes halfway toward the target center and then stops. If that halfway point is within the target, the task is completed. Otherwise, the pen tip is moved over half the remaining distance, and if that move brings the stylus to the target interior, the aiming is done, and so on. By this analysis, moving to a target can be thought of as a series of corrections. Said another way, moving to a target is a series of (implicit) “yes/no” questions: “Am I in the target now? If not, keep halving the distance (or reducing the remaining distance by p, more generally) until I’ve gotten my pen tip inside the target.” If each move takes a constant amount of time—an added assumption of the iterative corrections model of Crossman and Goodeve (1963/1983)—the relation between movement time, distance, and target size can approximate Fitts’ law.

Fig. 1
figure 1

Crossman and Goodeve’s (1963/1983) aiming model, for the case of covering p = .5 of the remaining distance to a target

How does it do that? We illustrate how by showing two graphics used in the lecture attended by Markus. Both were designed by David for pedagogic purposes. Table 1 shows how different movement times arise from different combinations of distance and time according to Crossman and Goodeve’s (1963/1983) model. Figure 2 shows how those predicted times relate to the times predicted by Fitts’ law.

Table 1 Predicted time to move to a target of width W/2 (cm) whose center is distance A (cm) away from the starting point, along with the distances remaining after the first through the final submoves, for each combination of A and W
Fig. 2
figure 2

Predicted time (s) as a function of \( {\log}_2\left(\frac{2\cdotp \mathrm{A}}{\mathrm{W}}\right) \) according to Crossman and Goodeve’s (1963/1983) model, illustrated in Table 1

Crossman and Goodeve’s (1963/1983) model was simple, and it was so on the mark in terms of predicting Fitts’ law that it became a vehicle for the analysis of aiming. Many follow-ups were conducted. For example, the model predicted that corrective submoves would be observed, and they were (Jagacinski, Repperger, Moran, Ward, & Glass, 1980). The model also predicted that Fitts’ law would not apply to very-short-distance movements, and that prediction was also supported (Klapp, 1975).

Ironically, Crossman and Goodeve’s (1963/1983) model was also found wanting. As summarized by Wing (1983) and others who have reviewed the aiming literature (Elliott et al., 2010; Elliott et al., 2001; Rosenbaum, 2010; Schmidt et al., 2018), one problem was that successive submovements were not always observed. Another problem was that when successive submovements were observed (i.e., when there was significant slowing, followed by a significant speeding up), the stops (or near stops) did not always occur at a constant proportion, p, of the distance between the last stop point (or home position) and the target center. Finally, Crossman and Goodeve’s assumption that p lay between 0 and 1, which it had to in order to ensure homing in on the target, led to the prediction that there would never be overshoots, for p could never exceed 1. Yet overshoots were observed. Much as golfers hit balls that are too long as well as too short, people moving pen tips, cursors, and other indicators to targets often go too far and then reverse course. Homing in on targets is bidirectional, then, contrary to the prediction of Crossman and Goodeve’s model.

Given all these problems, one might ask why the iterative-corrections model got the attention it did, and why Wing (1983) decided to print it. The reason is that, because of its simplicity and transparency, it provided a framework for testing specific predictions within a fundamental domain of human perception and performance. The fact that it was incorrect in some details did not invalidate its main assumption that Fitts’ law stemmed from feedback-based error correction. By analogy to other instances of wrong theorizing, and without meaning to overstate the case, given the grandiosity of the other theories to be mentioned now, the incorrectness of Aristotle’s theory of motion as shown by Newton, and the incorrectness of Newton’s theory of motion as shown by Einstein, did not lead to expulsion of these earlier theories from reviews of progress in physics. So, too, was it with Crossman and Goodeve’s (1963/1983) model.

Schmidt

Crossman and Goodeve’s (1963/1983) model of aiming led to other models. These models, and especially the second of the two to be mentioned, brought the study of aiming to a state of reasonable integration. We will briefly review these successors now, though we have completed our coverage of Crossman and Goodeve’s model, because these successor models help set the stage for the presentation of Crossman’s (1959) Power Law of Learning.

The first alternative to Crossman and Goodeve’s (1963/1983) model was a response to these authors’ idea that the temporal dynamics of aiming are mainly rooted in feedback. According to this alternative account (Schmidt, Zelaznik, Hawkins, Frank, & Quinn, 1979), most of the temporal variance in aiming can be explained by the ballistic phase rather than by the online corrective phase. Recall that these two phases were the ones Woodworth identified.

Schmidt et al. (1979) obtained evidence for their alternative hypothesis in experiments in which subjects moved a stylus as quickly as possible from one point to another. The fact that the target was just a point meant that subjects could not, strictly speaking, get inside the target; all they could do was get as close as possible. Schmidt et al. (1979) found that the standard deviation of the distance that was covered increased with the mean of the distance and decreased with the time in which the distance was traversed. Saying this another way, the standard deviation of the covered distance was proportional to velocity (distance divided by time). Building on this fact, Schmidt et al. (1979) argued that Fitts’ law could be explained without reference to feedback, but instead could be explained with reference to feedforward control (i.e., preprogramming of movements and subsequent ballistic execution).

A reason to mention the model of Schmidt and his colleagues is to observe that Schmidt, like Woodworth and Fitts, was interested in skill more broadly. In his best-known work, Schmidt (1975) introduced a schema theory of skill learning. The main idea, which was traceable to Bartlett’s (1932) studies of memory and to Head’s (1920) studies of the representation of the body, was that knowledge of how to do things is expressed in schemas. Schemas may be thought of as functions (in the mathematical sense) that can take on a range of inputs and yield different corresponding outputs. Schemas are abstract in the sense that mathematical functions are.

Schmidt (1975) argued that the essence of skill is being able to perform in a range of related circumstances. Schemas support such flexibility, he suggested, in a way that more rigid knowledge structures cannot. Consistent with this hypothesis, Schmidt (1975) showed that learning in a range of circumstances leads to more skillful long-term performance than does learning in highly specific circumstances. This benefit is shown most clearly when new circumstances arise (e.g., van Rossum, 1990; but see Shea & Wulf, 2005).

Meyer

The final model of aiming covered in David’s lecture was that of David Meyer and his colleagues (Meyer, Abrams, Kornblum, Wright, & Smith, 1988; Meyer, Smith, & Wright, 1982). It is worth saying something about this model, both to round out the story of aiming and also to observe that here was another psychological scientist of note—Meyer was elected to the National Academy of Sciences in 2009—who studied aiming and later set his sights on a more general theory of skill learning, as had been the case for the others discussed above, and for Crossman as well, as will be seen in the next section.Footnote 1

Meyer and colleagues wondered whether the model proposed by Schmidt et al. (1979), which focused on feedforward control, could be reconciled with the model of Crossman and Goodeve (1963/1983), which focused on feedback control. Meyer et al. (1988; Meyer et al., 1982) proposed that such a rapprochement was possible. They proposed a hybrid model that included both processes. According to their optimized submovement model, moving to a target can be viewed as a series of submovements, any of which may undershoot or overshoot the target. The proportion of the remaining distance to the target covered by a submovement can change (unlike in Crossman & Goodeve, 1963/1983), depending on the ultimate aim of minimizing the time to reach the target, subject to the constraint that the standard deviation of the submove distance increases with the submove velocity (as in Schmidt et al., 1979). Meyer et al. (1988; Meyer et al., 1982) showed that Fitts’ law reflects such optimization.

Although some challenges have been raised to the model of Meyer et al. (1988; Meyer et al., 1982; e.g., that the distribution of the primary movements’ endpoint is not centered at the middle of the target, but rather undershoots the target more than it overshoots; see Elliott et al., 2010, and also Worringham, 1991), this model is generally considered the best account of aiming today.

Besides wanting to round out our coverage of aiming, our other reason for mentioning Meyer is that he pursued a general theory of skill learning (Kieras & Meyer, 1997). The model of Kieras and Meyer focused on the strengthening of stimulus–response bonds in the context of a production system (i.e., a system of if–then, condition–action pairs). According to the model, the probability and rapidity with which effective condition–action pairs can be produced increases with practice. The model makes detailed predictions and accounts for many core findings on skill acquisition, including the fact that speed increases with practice. This is the topic of the next section, and the topic for which Crossman is best known.

The Power Law of Learning

As we have said, many researchers who studied aiming were also interested in pursuing a general theory of skill learning. Was Crossman, too? Indeed he was.Footnote 2 Crossman offered a quantitative description of the relation between task completion time and amount of practice. The description he offered has come to be called the Power Law of Learning. This law—so called because of its generalizability—has attracted a great deal of attention, not just in psychological science, but also in human factors, computer science, and other fields. All theories of skill learning that have sought to explain the relation between practice and task completion time have, to the best of our knowledge, addressed Crossman’s formulation.

The mathematical claim and its basis

What does the Power Law of Learning say, and on what basis did Crossman develop it? The Power Law of Learning says that the more a task is practiced, the less time the task takes. In addition, and more subtly, the rate at which the task time declines becomes smaller as practice continues. Expressed mathematically, the law says

$$ T=a\cdotp {P}^{-b}+c, $$

where T is task completion time, P is the amount of practice (typically expressed in terms of the number of times the task has been done), and a, b, and c are empirical constants.Footnote 3

In terms of the data that Crossman (1959) used to motivate the Power Law, it is often said that those data came from a study of factory cigar rollers. A Wikipedia entry on the Power Law of Learning (https://en.wikipedia.org/wiki/Power_law_of_practice; retrieved on February 2, 2019) is instructive in this regard. It says that Crossman developed it from “a study of a cigar roller in Cuba” (i.e., just one person in this Caribbean country). We mention this, aware that Wikipedia is not always the most reliable source of scholarly information, just to illustrate how rumor-prone the coverage of Crossman’s work has become. Its susceptibility to rumor reflects how often it is discussed.

If one reads Crossman’s (1959) article, one sees that cigar-rolling data did play a role in his research, but the data came from “several girls in the same shop, operating special-purpose cigar-making machines” (p. 156). On that same page of the article, Crossman says that the data were reported in full in chapter 10 of his doctoral dissertation (Crossman, 1956).Footnote 4

This chapter 10, entitled “Industrial Case-Studies of Perceptual Analysis,” describes, among other tasks, the two main steps of cigar making: bunch making and wrapper laying. The data presented in Fig. 2 of Crossman (1959) correspond to Fig. 10.4 of Crossman (1956). According to the figure captions in the article and the thesis, the data were gathered from ten (female) workers, each with a varying amount of experience. The most important feature of the data, Crossman observed, was that the longer the workers worked, the faster they got. Even for cigar rollers who had rolled upward of ten million cigars, their cigar-rolling times still decreased, albeit ever so slightly. If there was an asymptote to the performance time (a final leveling-off value), it had not been reached, even after ten million cigar rolls. This conclusion was supported by Crossman’s (1959) fit of a Power Law equation to the data. A critically important feature of the Power Law fit is that it has no predefined final limit of performance. This contrasts with an exponential function, which does. The consideration of the exponential function will be important later in our discussion.

Cigar rolling was not the only speed skill task that Crossman (1959) summarized. He talked about other tasks as well, including crossing out es in nonsense French, card sorting, adding digits, substituting one code for another, maze learning, and operating a lathe. For all these speed skills, which had data sets obtained by others rather than by Crossman himself, though he did summarize them, there was a clear distinguishing feature: When the data were plotted in log–log coordinates (i.e., when the logarithm of task completion time was plotted as a function of the logarithm of the number of practice trials), the data were fitted well with straight lines. Crossman (1959) emphasized that he was not the first to show this. Rather, the relationship was first shown by a Dutch researcher named de Jong (1957). Crossman was so impressed with de Jong’s result that he declared in his 1959 article that “the relationship may be called de Jong’s law” (p. 156). His statement to this effect provides some insight into his character.Footnote 5

The temporal dynamics behind the Power Law of Learning

De Jong (1957) was surely aware that linear fits in log–log coordinates reflect a Power Law relation, for whenever one takes the logarithm of both sides of an equation such as the Power Law equation, one gets a linear relation. But much as Fitts did not focus on saying why his quantitative relation held, neither did de Jong, as far as we know, although de Jong did mention that “Discontinuous and hesitant movements become smoother” (p. 55) with increasing cycles (i.e., with more practice).

Crossman (1959) sought to come up with a process model for de Jong’s law, much as he did for Fitts’ law. Interestingly, the mechanism that Crossman (1959) proposed for the Power Law was similar to the one he proposed for Fitts’ law. In both cases, he conceived of a hunt based on feedback. For Fitts’ law, the hunt was for a target in a space of possible targets, where reaching the target was the culmination of the hunt. For the Power Law, the hunt was for a better method in a space of possible methods; finding a method that was better than the best one achieved so far (in terms of a shorter task completion time) was the goal. Finding a better method, in Crossman’s model, was, at any given time, a matter of chance.

Relying on chance in this way meant that Crossman appealed to trial-and-error learning. He did so explicitly, right from the start of his 1959 article. Indeed, he motivated his analysis by referring to Thorndike’s (1898) classic work on learning by trial and error, although he cited Hilgard’s (1948) textbook, Theories of Learning, rather than Thorndike’s work itself. Crossman (1959) also cited Fisher’s (1930) Genetical Theory of Natural Selection to establish the mathematical and biological plausibility of a random selection process.

Crossman showed that methods that enable ever shorter task completion times can be selected with ever higher probabilities, and that the chance of finding a method that enables a shorter time decreases as practice continues. The improvements come more and more gradually and are smaller as practice continues. This relation is captured by the Power Law.Footnote 6

Developments after Crossman’s (1959) article

Since the publication of Crossman’s article, more demonstrations of the Power Law have appeared, including (but not limited to) reports that the law applies to the time to learn to read distorted text (Kolers, 1976), the time to read sentences aloud as well as subvocally (MacKay, 1982), and the time to write books for the prolific author Isaac Asimov (Ohlsson, 1992). It has also been used as a benchmark for developing theories of learning. Here is a quote about its importance from Heathcote, Brown, and Mewhort (2000, p. 185), a study we will focus on shortly:

The power function’s status as a law has also made it a gold standard by which to judge the success of models of skilled performance, including ACT and related models . . . , the component power laws model . . . , network models . . . , instance theories . . . , and Newell and Rosenbloom’s (1981) chunking model. . . . Logan (1988) leaves no doubt about the importance of the form of the practice function for theories of skill acquisition: “The power-function speedup [is] a benchmark prediction that theories of skill acquisition must make to be serious contenders” (p. 495).

In spite of all this support for the Power Law, its legitimacy has been questioned. The authors who provided the foregoing quote (Heathcote et al., 2000) entitled their article, “The Power Law Repealed: The Case for an Exponential Law of Practice.” As this title indicates, these authors felt that an exponential function provided a better fit than a power function for predicting practice-based time reductions.Footnote 7

What was the problem that Heathcote et al. (2000) saw with the Power Law? They observed that data for individual subjects were fitted better with an exponential than with a power function, though the Power Law gave a better fit to data averaged over subjects than did the exponential. According to these authors, only two previous teams had fitted single-subject as well as average-subject data, and those two teams had got the same result—better power-function fits for data averaged over individuals, but better exponential-function fits for the data of individuals (Josephs, Silvera, & Giesler, 1996; Rosenbloom & Newell, 1987).

In keeping with the historical focus of the present article, we note that since the publication of Heathcote et al.’s (2000) study, Snoddy’s (1926) original mirror-tracing task was replicated so that data from individual subjects could be fitted with an exponential as well as with a power function. Recall that Snoddy’s study was the one that provided the first source of evidence for a power-law relation, though the data were errors rather than times; Crossman was the first to focus on time. Harking back to Snoddy’s work, Stratton, Liu, Hong, Mayer-Kress, and Newell (2007) showed that the exponential function did indeed do a better job accounting for the mirror-tracing error data of 16 individual subjects than did the power function, exactly as predicted by Heathcote et al. At the same time, the power function did a better job accounting for the average data (the data averaged over subjects), again consistent with Heathcote et al.

What should one make of this? Certainly, important lessons can be drawn about the dangers of averaging. It is misleading if the best curve for an aggregate of data misrepresents what the data look like for individuals. This point has been discussed in connection with Heathcote et al.’s (2000) article by Lewandowsky and Farrell (2011). Heathcote et al. also emphasized this point in their article, but they stressed that they really cared about the difference between the two functions because the functions implied (or assumed) different kinds of learning rates. In the case of the exponential, the learning rate stays constant, whereas in the case of the power function, the learning rate decreases.

Does this difference matter? It might matter in applied settings where performance at various times could potentially differ, depending on the constancy or inconstancy of the putative learning rate. The difference might also matter if different models of learning, including neural or machine-learning models, had to be judged according to whether they predicted power-law or exponential learning. One might also question whether it makes sense for there to be an a priori final limit, which is the case for the exponential, but not for the power function.

As to whether the difference matters for Crossman’s legacy, we have two comments. First, Crossman did not check exponential function fits. The reason, presumably, was that his theory didn’t predict an exponential function. Many possible functions—indeed, an infinite number of them—might underlie any given data set. Checking the fits of functions that aren’t relevant for theoretical purposes is unnecessary. If Crossman didn’t evaluate exponentials, he needn’t be reproached for failing to do so, nor has he been, as far as we know, including by Heathcote et al. (2000).

Second, it is unclear, a priori, whether it is antithetical to Crossman’s (1959) substantive theory if an exponential function prevailed over a power function. A random search process in which current solutions are supplanted by better solutions can lead to an exponential function, as shown below.

We conducted simulations in which we used the learning process that Crossman assumed, seeking to determine whether the simulated learning data could be accounted for with an exponential function as well as, or possibly better than, a power function (see the Appendix). Our general approach was to create a function, F(x), which, along with whatever parameters it used, comprised an ideal model for generating outputs given inputs x. Next, we had our (Matlab) program randomly choose parameter values that potentially reduced the sum of the squared deviations between the ideal and predicted outputs. We assumed that a learner effectively engages in such a random search process, at least if he or she is using trial-and-error learning. In effect, he or she hunts for better parameters, letting the new parameters take over once they are found. Critically, and in keeping with Crossman’s core claim, in our simulations better solutions were found by chance alone.

We ran our simulations as follows. For each of T = 2,000 trials per run, we generated a random parameter set and retained it if it yielded a better fit than the best parameter set that had been found so far. Whenever a better fit was found (i.e., a smaller sum of squared deviations between the ideal and obtained values), the trial number of the improvement was recorded, as was the smallest sum of squares up to that point. By repeating this process R = 2,000 times (i.e., running through the T = 2,000 trials a total of R = 2,000 times), we built a histogram of the trial numbers in which better fits (smaller sums of squares) were obtained.

The result is shown in Fig. 3. As expected, the trials in which improved fits were found were most numerous in early trials, and decreased thereafter. Whether the histogram was fitted with a power function or an exponential function made no meaningful difference. In both cases, the proportion of variance accounted for exceeded .99. The bottom line, a propos Crossman (1959), is that a random search process, akin to the one he proposed, can yield data that are accounted for well with an exponential function. Therefore, finding that an exponential function fits actual data, and may actually fit better than a power-law function, does not impugn Crossman’s (1959) substantive theory.

Fig. 3
figure 3

Simulation results of a random search process akin to the one proposed by Crossman (1959).

It is important to note that this topic still raises considerable interest in contemporary research. Just recently, Evans, Brown, Mewhort, and Heathcote (2018) readdressed the issue of power versus exponential functions while at the same time integrating additional phenomena, such as an “initial period of slower learning followed by a speed-up before the final approach to asymptote” (p. 594), by adding a delay parameter to the original formulations. According to their results, the delayed exponential model provided the best fit in the majority of the data sets considered. These authors also acknowledged a number of exceptions, so the topic likely will remain of interest in the coming years.

Ted Crossman

Who, then, is or was E. R. F. W. Crossman? In turning to this question, we return to the lecture where we decided to address it. In the lecture, when David got to Crossman and Goodeve (1963/1983), he said something similar to the following: “I’m about to tell you about a tremendously influential study in the field of aiming, but I know nothing about the first author, and everyone I’ve ever talked to in the field is equally ignorant of who this person is or was. I wish I could tell you more, but I can’t.” As David made this pronouncement, he showed a PowerPoint image of a gray figure with a question mark. The image contrasted starkly with the other pictures he had shown, of Robert Woodworth and Paul Fitts, and subsequently would show, of Richard Schmidt and David Meyer.

After the class, Markus told David that he, too, had also long wondered about Crossman. It was then that we resolved to do some detective work. We decided we would try to find out who Crossman is or was, mainly to satisfy our curiosity, not having the idea that we would eventually prepare an article about this person.

We began by emailing others in our field and learned very quickly that they, too, knew virtually nothing about Crossman. We also learned that they, like us, wanted to learn more. We also found out that most of these colleagues were (or are) a bit like the first author in not being quite as web-savvy as might be ideal. It turned out that with a bit of web-search smarts, one can get some basic information about Crossman. Markus, who is more web-savvy than David (perhaps because he is about 20 years younger), was able to get some useful information, including, most usefully, an obituary that we used to get in touch with his family members (www.berkeleydailyplanet.com/issue/2001-02-12/article/3412?headline=UC-Berkeley-Professor-Emeritus-Ted-Crossman-dies). Thanks to their cooperation, we can now present much more information than can be gotten on the internet.

Edward Robert Francis Ward Crossman (Fig. 4) was born on September 25, 1925, in Hambrook, Bristol, UK. He was the son of two doctors. He had a brother and a sister. Crossman went by the name “Ted,” as we learned from Rachel Hope Crossman, Ted’s daughter-in-law and our primary informant. As we also learned from Rachel, family tradition was to invoke the names of one’s ancestors, which is why Edward (“Ted”) had so many names. This may have also been why he published with that long string of initials.

Fig. 4
figure 4

Ted Crossman in his passport photo (1989). The photo was generously provided by Crossman’s daughter-in-law, Rachel Hope Crossman

Ted Crossman was the nephew of a famous UK politician, Richard Crossman, who was a leading light of the British left. When Richard Crossman died in 1974, The New York Times ran a long obituary about him (https://www.nytimes.com/1974/04/06/archives/richard-crossman-66-is-dead-leading-thinker-of-british-left.html).

We have a story about Ted Crossman (the psychological scientist) and Richard Crossman (the politician) from Rachel, who wrote this to us in an email:

When Ted got married and went on a honeymoon he played a practical joke on his uncle and sent a postcard (intended to arrive on April Fool’s Day) saying that he and his bride had been kidnapped by the Russians. It apparently did not read as a joke and became a (minor) international incident with headlines in England screaming that the nephew of a member of parliament had been kidnapped! We have those newspaper clippings! Ted was a real character.

The honeymoon that Ted went on was with Patricia Marie Carter, with whom he had four children. The two oldest children died at separate times in adulthood. As we learned from Rachel, Francis (Frank) Hedley Danvers Crossman, Ted’s oldest son, died of a heroin overdose, and Ted’s daughter, Lucia (Lucy) Edna Alice, died from a brain aneurysm less than a year later. Ted Crossman’s marriage later ended in divorce.

Well before getting married and becoming a father, Ted joined the Royal Air Force at the age of 19. At the age of 22, he was stationed in Hiroshima, shortly after its bombing in World War II.

After his military service, Ted began his higher education. He went to Cambridge University and received a B.A. in Natural Sciences. After that, he did graduate work at Birmingham University, where he received his Ph.D. in Engineering Production in 1956. After getting his doctorate, he taught at Reading University, and then moved to Oxford University in 1962.

In short order, while at Oxford, Ted was offered tenured positions at MIT and Berkeley. He visited both places and, according to family, chose Berkeley because the weather was so much better than at MIT, which was too much like weather in England. Another reason Crossman decided to go to Berkeley was that a new mechanical engineering building was being built there (Etcheverry Hall). Crossman joined Berkeley’s Department of Industrial Engineering in 1964.

During his time at Berkeley, Crossman served as department chair from 1969–1970. He resigned that post after a short time to protest the university’s way of dealing with the student demonstrations in People’s Park, especially in the Bloody Thursday incident on May 21, 1969. This information came to us by way of our sources.

By all accounts, Crossman was a reclusive, somewhat quirky figure. According to his family, he would often be off in another world, so to speak—thinking. He had a notebook (or series of them) in which he wrote down ideas when they came to him, even in the midst of family dinners. One of the pictures we have of Crossman (Fig. 5) shows him apparently holding one of his notebooks.

Fig. 5
figure 5

Ted Crossman in front of his house in Berkeley, California, taken around 1998. The small black book in his left hand (atop the newspaper) is most likely one of the notebooks he apparently carried with him most of the time, in his desire to write down new ideas, according to his daughter-in-law Rachel Hope Crossman, who kindly provided this photo. According to her, the newspaper was the Wall Street Journal

We also have a verbal sketch of Crossman from Professor John Morton of the Institute of Cognitive Neuroscience in London and former director of the Medical Research Cognitive Development Unit at University College London. Readers may recall that John Morton famously introduced the logogen model of memory (Morton, 1969), conducted experiments on the suffix effect in short-term memory (Morton, Crowder, & Prussin, 1971), wrote a brilliantly hilarious one-page article on recursion (Morton, 1976), and introduced the concept of perceptual centers for heard words (Morton, Marcus, & Frankish, 1976), among other contributions. Here is an extract of an email that Morton sent us about Crossman on May 24, 2018:

I met him in 1957. He was then a lecturer in psychology at Reading University. I guess he’d been there for three or four years. It was a very small department with just four lecturers and the head of Department, Magdalena (Maggie) Vernon. But there would only be a dozen or so honours undergraduates each year and I was the only graduate student in the Department for the whole three years I was there. . . . Ted became my de facto supervisor and protector. He was extraordinarily generous in his time, effectively built the amplifiers that were necessary to record eye movements (all very primitive by current standards), gave me someone to bounce ideas off. Not only that, but after I’d been in Reading a year I went to live in his house. He was married to Pat and they had at that time one child, Frankie. Pat was pregnant with Lucy, Rob followed the following year and Martin was born in Oxford I believe. Pat and Ted were a very unlikely couple. Pat was glamorous, vivacious, loquacious, interested in books and poetry music and theatre. Ted was interested in work, and that was all that manifested itself. He was an incredibly shy person. I was chatting just now to an old friend who I first met in Reading, also a psychologist, who commented “Ted was not one of the world’s great communicators.” He added that he thought Ted was very civilised, and treated students like grown-ups “despite being the worst lecturer I ever encountered.”

That shyness may also explain why, when we contacted former department colleagues of Crossman’s at Berkeley, their replies hinted at the fact that Crossman was, to say the least, not at the center of the social scene in the department. His family, too, confirmed that he stayed away from academic politics. Indeed, the fact that so little was known about him among researchers in his (and our) field may have been a reflection of how much he kept to himself. Yet the generosity he showed to others, reflected in his enthusiastic accolade for de Jong (1957) in the Power Law article, and his pioneering spirit in science, reflected in John Morton’s remembrance, was communicated to us as well by Pete Goodeve, the co-author of Crossman and Goodeve (1963/1983), with whom we emailed on October 19–20, 2018:

I spent my graduate time in 1174 Etcheverry Hall (in the basement—next to the nuclear reactor at the time!). Ted had managed to acquire the first minicomputer on campus—a PDP-8—which I spent a lot of time working on. As I remember, to get it he had to avoid calling it a “computer.” It was a “Digital controller” or some such. . . . One of his main interests was always limb control—following on from that original paper. I remember he built an apparatus with a cord attached to a stylus and potentiometers with which we could track the actual time course of a hand movement. The PDP-8 was of course essential for that. . . . I think he was probably the first to use a minicomputer in psych research. In later years a lot of folks on campus were doing the same.

Ted Crossman retired in 1987. According to his obituary, “he maintained an office in Etcheverry Hall until the time of his death, where he continued to meet with graduate students” (Berkeley Daily Planet). The obituary reported that he died after a short illness on February 5, 2001, at the age of 75.

Concluding remarks

The story we have told gives a small glimmer of who Ted Crossman was. The most surprising feature of his story, which frankly was a bit disappointing to us, was that the two articles of his that we knew about were the main articles he produced. Our consultation of Google Scholar revealed a remarkably sparse publication record, other than his 1959 and 1963/1983 works. He published some work on optometry (Crossman, Goodeve, & Marg, 1970; Marg, Crossman, Goodeve, & Wakamatsu, 1972), other work on discriminability (Crossman, 1955), some work on organizational issues (Globerson & Crossman, 1976a, 1976b), and a few book chapters and reports, among them studies from the Fire Research Group at UC Berkeley. When we asked Pete Goodeve (who left academia) about Crossman’s relative lack of productivity, Goodeve wrote

My guess is that the environment at Oxford was highly stimulating for his interests. Industrial Engineering at Berkeley . . . may have been much less so. I never got a sense of much interaction between the Human Factors Lab and other members of the department.

Goodeve also told us that he did not think the death of Crossman’s two oldest children was the cause of Crossman’s lack of productivity, nor was Crossman’s divorce, which, according to Goodeve, was ultimately amicable.

Perhaps the single clearest hint to the cause of the relative paucity of published work by Crossman was his own reticence. The fact that it took another investigator, Alan Wing, to get Crossman and Goodeve’s paper into print, 20 years after the work was presented at a meeting and written up as a technical memorandum that was passed around informally, speaks to Crossman’s reluctance to publish or to his relative indifference about doing so. Not all investigators are intent on publishing, of course, and the times and culture around publishing were different in the 1950s and 1960s than they are now.

This article is appearing nearly 40 years after Alan Wing’s resurrection of Crossman and Goodeve’s mimeograph. The fact that researchers like Alan Wing and the two of us have found Crossman’s work worth considering over so many spans of time speaks to the depth of Crossman’s thinking. Ted Crossman must surely be viewed as one of the important thinkers in the area of human perception and performance. He had a deep influence within this domain, and in psychological and other sciences more generally. From what we have learned, it appears that Ted Crossman was not just a brilliant person, but a generous one as well. We regret that we never met him, but are glad to have been able to learn at least a little and to be able to share what we have learned with others.