Is it impossible to acquire absolute pitch in adulthood?

Wong, Yetta Kwailing; Lui, Kelvin F. H.; Yip, Ken H. M.; Wong, Alan C.-N.

doi:10.3758/s13414-019-01869-3

Is it impossible to acquire absolute pitch in adulthood?

Published: 04 November 2019

Volume 82, pages 1407–1430, (2020)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Is it impossible to acquire absolute pitch in adulthood?

Download PDF

Yetta Kwailing Wong¹,
Kelvin F. H. Lui²,
Ken H. M. Yip² &
…
Alan C.-N. Wong ORCID: orcid.org/0000-0002-2129-3485²

6760 Accesses
10 Citations
32 Altmetric
1 Mention
Explore all metrics

Abstract

Absolute pitch (AP) refers to the rare ability to name the pitch of a tone without external reference. It is widely believed to be only for the selected few with rare genetic makeup and early musical training during the critical period, and therefore acquiring AP in adulthood is impossible. Previous studies have not offered a strong test of the effect of training because of issues like small sample size and insufficient training. In three experiments, adults learned to name pitches in a computerized, gamified and personalized training protocol for 12 to 40 hours, with the number of pitches gradually increased from three to twelve. Across the three experiments, the training covered different octaves, timbre, and training environment (inside or outside laboratory). AP learning showed classic characteristics of perceptual learning, including generalization of learning dependent on the training stimuli, and sustained improvement for at least one to three months. 14% of the participants (6 out of 43) were able to name twelve pitches at 90% or above accuracy, comparable to that of ‘AP possessors’ as defined in the literature. Overall, AP continues to be learnable in adulthood, which challenges the view that AP development requires both rare genetic predisposition and learning within the critical period. The finding calls for reconsideration of the role of learning in the occurrence of AP, and pushes the field to pinpoint and explain the differences, if any, between the aspects of AP more trainable in adulthood and the aspects of AP that are potentially exclusive for the few exceptional AP possessors observed in the real world.

Generalizing across tonal context, timbre, and octave in rapid absolute pitch training

Article 23 January 2023

Noah R. Bongiovanni, Shannon L.M. Heald, … Stephen C. Van Hedger

Individual differences in human frequency-following response predict pitch labeling ability

Article Open access 12 July 2021

Katherine S. Reis, Shannon L. M. Heald, … Howard C. Nusbaum

The Musical Ear Test: Norms and correlates from a large sample of Canadian undergraduates

Article 11 March 2021

Swathi Swaminathan, Haley E. Kragness & E. Glenn Schellenberg

Absolute pitch (AP) refers to the ability to name the pitch of a tone (e.g., naming a tone as “C”) or to produce it without external reference tones (Takeuchi & Hulse, 1993; W. D. Ward, 1999). While the majority of us can effortlessly identify a countless number of faces, objects, and visual and auditory words, most people find it very difficult to name the twelve pitches, and professional musicians are no exception (Athos et al., 2007; Levitin & Rogers, 2005; Zatorre, 2003). The most extreme estimate states that, in every 10,000 people, there is one ‘AP possessor’ who can perform AP judgment accurately and effortlessly (Takeuchi & Hulse, 1993). This rare ability is considered a special talent and endowment for gifted musicians (Deutsch, 2002; Takeuchi & Hulse, 1993; W. D. Ward, 1999; but see Levitin & Rogers, 2005). The genesis of AP has therefore been a perplexing research topic among musicians, psychologists and neuroscientists for more than a century (Deutsch, 2002; Levitin & Rogers, 2005; Takeuchi & Hulse, 1993; W. D. Ward, 1999).

At the cognitive level, a widely accepted hypothesis suggests that AP involves two stages of processing (Levitin, 1994; Levitin & Rogers, 2005). The first stage is AP memory, which general listeners can also establish through exposure to tones, songs and music excerpts (Halpern, 1989; Levitin, 1994; Schellenberg & Trehub, 2003). This pitch memory is typically regarded as implicit, i.e., participants cannot verbally describe what has been remembered, and as absolute because participants can discriminate between original musical excerpts and the excerpts that were shifted in pitch by one semitone without the assistance of external pitch references (Schellenberg & Trehub, 2003). The second stage is the ability to associate the represented AP memory with verbal labels, which is somehow mastered by the ‘AP possessors’ only (Brancucci, Dipinto, Mosesso, & Tommasi, 2009; Deutsch, 2002; Levitin & Rogers, 2005; Schellenberg & Trehub, 2003; Vanzella & Schellenberg, 2010). In other words, the bottleneck of AP performance is in assigning verbal labels to the tones rather than establishing AP memory representation per se.

At the neural level, AP has been associated with distinct neural markers. In functional neural imaging, ‘AP possessors’ have shown an increased activity in the superior temporal gyrus (Ohnishi et al., 2001; Schulze, Gaab, & Schlaug, 2009; Wengenroth et al., 2013; S. J. Wilson, Lusher, Wan, Dudgeon, & Reutens, 2009) and the left dorsal lateral prefrontal cortex (Bermudez & Zatorre, 2005; Zatorre, Perry, Beckett, Westbury, & Evans, 1998). Individuals with and without AP have also been shown to differ in their brain activations reflected in various event-related potentials (ERPs), including the N1 (Itoh, Suwazono, Arao, Miyazaki, & Nakada, 2005; Pantev et al., 1998; Wu, Kirk, Hamm, & Lim, 2008), P2a (Wengenroth et al., 2013), and P3 (Hantz, Crummer, Wayman, Walton, & Frisina, 1992; Hirose et al., 2002; Klein, Coles, & Donchin, 1984; Rogenmoser, Elmer, & Jäncke, 2015). Other structural and functional connectivity differences have also been identified, including the size asymmetry of the planum temporale (Keenan, Thangaraj, Halpern, & Schlaug, 2001; Schlaug, Jancke, Huang, & Steinmetz, 1995), the architecture of the superior longitudinal fasciculus (Oechslin, Imfeld, Loenneker, Meyer, & Jäncke, 2009), and functional connectivity of different brain regions (Jäncke, Langer, & Hänggi, 2012; Loui, Li, Hohmann, & Schlaug, 2011; Loui, Zamm, & Schlaug, 2012). These suggest that AP may be supported by both functional and structural brain differences in ‘AP possessors’, as compared with other listeners.

The role of genes and the critical period

How can we explain the development of AP in ‘AP possessors’? One influential theory suggests that AP develops if two prerequisites are fulfilled (Bachem, 1940; Baharloo, Johnston, Service, Gitschier, & Freimer, 1998; Chin, 2003; Drayna, 2007; Levitin & Rogers, 2005; Ross, Olson, Marks, & Gore, 2004; Takeuchi & Hulse, 1993; Trainor, 2005; Zatorre, 2003). The first prerequisite is the rare genetic disposition to AP. It is reported that AP runs in families, as siblings of ‘AP possessors’ are more likely to be ‘AP possessors’ (about 14 - 48% of the ‘AP possessors’ reported to have a sibling or first-degree relatives that was also an ‘AP possessor’; Baharloo et al., 1998; Baharloo, Service, Risch, Gitschier, & Freimer, 2000; Gregersen, Kowalsky, Kohn, & Marvin, 1999, 2001). AP is often considered a relatively ‘clean’ cognitive phenotype (Baharloo et al., 2000; Gregersen et al., 2001), and some proposed that AP may be subserved by a single gene (Drayna, 2007; but see Theusch, Basu, & Gitschier, 2009 which suggests that AP is genetically heterogeneous).

The second prerequisite is an early onset of musical training that is within the critical period in childhood. A strong definition of critical period refers to the early period of life during which experience is essential for normal development and leads to permanent changes in brain functions and behavior (Knudsen, 2004)^{Footnote 1}, and we based the current discussion on this definition because this description fits well with the widespread belief of AP (Bachem, 1940; Baharloo et al., 1998; Chin, 2003; Drayna, 2007; Levitin & Rogers, 2005; Ross et al., 2004; Takeuchi & Hulse, 1993; Trainor, 2005; Zatorre, 2003). The critical period of AP is thought to be similar to that of language development (Chin, 2003; Deutsch, Dooley, Henthorn, & Head, 2009). Supporting evidence comes from large-scale survey studies in which the average onset of musical training of ‘AP possessors’ was 5.4 years, about 2.5 years earlier than that of ‘non-AP possessors’ (Gregersen et al., 1999; see also Baharloo et al., 1998; Gregersen, Kowalsky, Kohn, & Marvin, 2001).

Importantly, AP training was considered relatively successful in young children (Crozier, 1997; Miyazaki & Ogawa, 2006; Sakakibara, 2014). In contrast, while AP can improve to some extent in adulthood with deliberate practice (Brady, 1970; Cuddy, 1968, 1970; Hartman, 1954; Meyer, 1899; Mull, 1925; Russo, Windell, & Cuddy, 2003; Van Hedger, Heald, Koch, & Nusbaum, 2015; Wedell, 1934), there is no convincing evidence that adults can attain a performance level comparable to ‘AP possessors’ through training (Bachem, 1940; Levitin & Rogers, 2005; W. D. Ward, 1999). According to this influential theory, most professional musicians fail to acquire AP because they fail to start their music training within the critical period and/or because they do not carry the specific genes. Training AP in adulthood, when the critical period of acquiring AP has long passed, should be practically impossible (Bachem, 1940; Crozier, 1997; Trainor, 2005; but see Gervain et al., 2013 for the possibility of reopening the critical period by taking a drug).

Interestingly, the concept of critical period well explains the development of basic functions and structure of different sensory systems but not for high-level cognitive abilities. In vision, depriving the visual input into one eye led to reduced ocular dominance in the visual cortex of young cats but not in deprived adult cats (Hubel & Wiesel, 1970; Wiesel & Hubel, 1963). In hearing, disrupting sound input led to abnormal development of the auditory cortex in the early life of rats but not in older rats (Zhang, Bao, & Merzenich, 2002). In touch, depriving tactile inputs after birth led to abnormal functions of the somatosensory cortex (Simons & Land, 1987). Overall, normal development of the sensory cortices is determined by the environmental inputs during the early period of life (but see Hooks & Chen, 2007 for recent challenges of this view).

However, evidence is less as clear for high-level cognitive abilities. In vision, higher-level visual abilities, such as acuity, stereopsis and crowding, continue to develop in adulthood (Daw, 1998). In musical development, various degrees of plasticity in adulthood have been observed during the acquisition of musical pitch structure (e.g., consonance and dissonance, scale structure and harmonics) and the functional specialization of the auditory and motor cortices (see review in Trainor, 2005). In language acquisition, while it has been argued that there is a critical period for acquiring a native-like accent (Patkowski, 1990; Scovel, 1988), subsequent research demonstrated that it is possible for later learners, who started to acquire a second language after twelve years old, to attain native accents in various languages, including English, French and Dutch (Flege, Munro, & MacKay, 1995; see review in Singleton, 2001). In synesthesia, in which the perception of a stimulus consistently evokes experience that is not physically present (e.g., seeing the color ‘red’ on a black letter; J. Ward, 2013), the proposed cause is reduced pruning during early childhood within the critical period (Maurer & Mondloch, 2006). While synesthesia has a close genetic link with AP (Gregersen et al., 2013), synesthetic experiences can also be learned in adulthood (Bor, Rothen, Schwartzman, Clayton, & Seth, 2015). In sum, different complex human behaviors, previously thought to be constrained by the critical period, are now shown to be learnable in adulthood. AP seems to stand as an interesting exception that its development is suggested to be constrained by the critical period (Zeanah, Gunnar, McCall, Kreppner, & Fox, 2011).

A closer look into the literature of AP indicates that direct evidence for a critical period constraining AP development is weak. Similar to accent acquisition, some cases of AP were identified with later onset of musical training in the normal population (about 3-4% in the age groups of 9-12 and beyond 12; Baharloo et al., 1998) and in individuals with the Williams Syndrome (3 out of 5 ‘AP possessors’ with the Williams Syndrome started musical training at 8 years old or later; Lenhoff, Perales, & Hickok, 2001). These suggest that early onset of musical training during the critical period is not essential for developing AP (Gregersen et al., 2001). If the critical period is not essential and the genetic contribution of AP is only moderate, these two factors are unlikely to be the whole picture that describes AP development (Baharloo et al., 1998, 2000; Gregersen et al., 1999, 2001). What other factors could explain the genesis of AP?

The role of experience

A possible alternative is that AP is developed through perceptual learning. Perceptual learning refers to the long-term improvement on a perceptual task as a result of perceptual experience (Fahle & Poggio, 2002; Goldstone, 1998; Sasaki, Nanez, & Watanabe, 2010). It has been repeatedly demonstrated that humans extract information from environmental inputs and fine-tune their perceptual representations accordingly in different sensory modalities, including the visual (Fiorentini & Berardi, 1980; Karni & Sagi, 1993), auditory (Fujioka, Ross, Kakigi, Pantev, & Trainor, 2006; Kraus & Banai, 2007), somatosensory (Sathian & Zangaladze, 1997) and olfactory domains (D. A. Wilson & Stevenson, 2003). Perceptual learning occurs even when the stimuli are irrelevant to the task at hand (Seitz & Watanabe, 2009), for the non-diagnostic and task-irrelevant information of the trained stimuli (Y. K. Wong, Folstein, & Gauthier, 2011), and when participants are unaware of the information carried by the stimuli (Tsushima, Sasaki, & Watanabe, 2006; Watanabe, Nanez, & Sasaki, 2001). These demonstrate how the perceptual system is constantly tuned to environmental inputs.

Consistent with the perceptual learning hypothesis of AP, it is well-established that AP is shaped by experience (Takeuchi & Hulse, 1993; Y. K. Wong & Wong, 2014). For example, better pitch naming is often observed in musicians when the testing conditions better match their prior experience, such as using tones in the timbre of one’s own instrument (see review in Takeuchi & Hulse, 1993), tones in a highly used pitch like ‘A4’, which is the standard tuning pitch in orchestras (Levitin & Rogers, 2005; Takeuchi & Hulse, 1993), tones presented in a multisensory testing context similar to one’s musical training (Y. K. Wong & Wong, 2014), and tones associated with the more frequently used white keys than black keys (Athos et al., 2007; Miyazaki, 1989, 1990; Takeuchi & Hulse, 1993). Even performance in ‘AP possessors’ can be disrupted by recent listening experience with detuned music (Hedger, Heald, & Nusbaum, 2013). Experience also explains why non-musicians can recognize the starting tones of songs or melodies that are highly familiar in an absolute manner, which indicates that non-musicians form absolute memory of tunes based on repeated exposure (Levitin, 1994; Schellenberg & Trehub, 2003).

Previous training studies might have failed to convince researchers that AP is trainable in adulthood because of various issues. For example, training duration was very short in some studies (roughly 1-4 hours; Cuddy, 1970; Mull, 1925; Van Hedger et al., 2015), which might have limited the potential of observing larger effects in AP learning. Some training studies used a very small sample size (3 or less participants per condition; Brady, 1970; Cuddy, 1968; Hartman, 1954; Meyer, 1899; Mull, 1925; Wedell, 1934) and involved self-training of the authors of the manuscripts (Brady, 1970; Meyer, 1899). Other training studies were difficult to interpret. For example, with only a binary judgment on a single tone learned (e.g., “C” or not “C”; Mull, 1925; Russo et al., 2003), it is difficult to interpret whether the learned ability is comparable to ‘AP’ or not. Another study did not provide sufficient information (e.g., details of the testing task after training, accuracy, etc.) for interpreting the performance of the participants (Lundin, 1963). Therefore, despite the apparently substantial AP improvement attained in some of these studies (e.g., Brady, 1970; Lundin, 1963), it remains unclear whether AP can be acquired in adulthood. Furthermore, previous training studies might have lacked key components of effective training regimes, such as computerized stimulus presentation and feedback protocols that enable indiviudalized training progress, incentive for sustaining motivation of the training, etc, which might have collectively limited the potential AP improvement during training.

In this study, we tested whether AP acquisition in adulthood, i.e., attaining a performance level comparable to that of real-world ‘AP possessors’, is possible through perceptual training. This directly tested whether a critical period of development of AP exists, as stated in the genes and critical period theory of AP. We also aimed to examine whether the training-induced improvements are consistent with that of perceptual learning. This tested whether the perceptual learning theory of AP development can account for the behavioral changes (if any). We defined AP as the ability to name pitches accurately without external references, which is one of the most common definitions of AP in the literature (Levitin & Rogers, 2005; Takeuchi & Hulse, 1993; W. D. Ward, 1999). To test whether AP can be acquired, we first needed to define the performance level of real-world ‘AP possessors’. However, our survey of the literature revealed that the definition of ‘AP possessors’ was variable and arbitrary in previous work (e.g., using self-report or different performance-based measurements, with different scoring methods and cut-off points; see Methods). In this study, we arbitrarily adopted a relatively stringent training criterion of AP —being able to name, without any reference, all of the twelve pitches that constitute an octave at 90% accuracy, with semitone errors considered incorrect. Our survey of the literature indicates that participants who have passed our training would also be considered ‘AP possessors’ in most of the published AP studies that adopted performance-based definition of ‘AP possessors’ (see Methods of Experiment 1). This indicates that individuals having passed our training have acquired AP performance comparable to that of ‘AP possessors’ as defined in the literature.

To address our question, we designed different training regimes of AP by optimizing various factors, including the use of extended training duration, larger sample sizes, modern computerized protocols and measures to provide individualized training and to sustain motivation, e.g., by gamifying the training and including monetary reward for the training progress. In three experiments, we trained 43 adults to name the pitch of tones with different combinations of timbres and octaves for 12-40 hours in laboratory and mobile online settings. If specific genetic disposition and an early onset of music training are essential for AP acquisition, AP training in adulthood should be largely in vain and result in very limited improvement in all participants. Alternatively, if AP can be trained in adulthood as a type of perceptual learning, it should be possible, at least for some individuals, to attain a performance level similar to that of real-world ‘AP possessors’.

In addition, if AP is developed through perceptual learning, then AP learning should demonstrate several characteristics that match well with that of perceptual learning studies. First, perceptual learning typically leads to performance enhancement after training (Fahle & Poggio, 2002; Goldstone, 1998). In this study, we examined whether AP performance improved by measuring the number of pitches that participants learned to name accurately without feedback through training (see Methods).

Second, we examined whether the degree of generalization of AP learning changed as a function of the octaves and timbres involved in training. Although perceptual learning is often regarded as highly specific to the training stimuli (Fahle & Poggio, 2002; Watanabe & Sasaki, 2015), especially when the training is difficult and supported by more specific inputs (Ahissar & Hochstein, 2004), generalization of learning is often present in various untrained conditions to different degrees (Banai & Lavner, 2014; Fahle & Poggio, 2002; Watanabe & Sasaki, 2015; Wright, Buonomano, Mahncke, & Merzenich, 1997). Instead of simply regarding perceptual learning as ‘specific’, the degree of learning specificity and generalization during perceptual training is best conceptualized by understanding the psychological space involved in training and testing – whether the testing space overlaps with the training space would affect the extent of generalization (Y. K. Wong et al., 2011; see also Nosofsky, 1986, 1987; Palmeri & Gauthier, 2004). Hence, we predicted that including training tones in more octaves and timbres (as in Experiment 1 and 3) should cover larger regions of the psychological space, and thus lead to better generalization to untrained octaves and timbres. Including training tones in a single octave and timbre only should cover a smaller part of the psychological space (as in Experiment 2) and therefore should result in more specific AP learning.

Third, perceptual enhancement in perceptual learning is relatively long-lasting, often in the time scale of months or even years (Fahle & Poggio, 2002; Goldstone, 1998; Karni & Sagi, 1993; Watanabe & Sasaki, 2015). We tested whether AP improvement sustained for one to three months after the end of training.

Finally, if AP is developed through perceptual experience, it is possible that musicians may develop AP more easily with the long-term musical training than non-musicians. Learning AP is about establishing the association between the tones and the pitch names. The long-term exposure to musical tones of musicians have fine-tuned the perceptual representations of tones in the auditory system (Pantev et al., 1998), which may in turn facilitate their development of the mapping between the tones and the pitch names. In Experiment 2, musicians had at least six years of formal music training, while non-musicians had no formal music training or had brief training for less than a year. We explored whether musicians would learn AP more efficiently than non-musicians.

Experiment 1

In Experiment 1, we tested whether learning AP in adulthood is possible and examined the pattern of AP learning. Since how much training contributes to a sufficiently rigorous training to enable AP acquisition in adulthood is an untested empirical question, we decided to conduct a 12-hour training in this experiment, which was comparable to some previous perceptual learning studies in our laboratory (A. C.-N. Wong, Palmeri, & Gauthier, 2009; Y. K. Wong et al., 2011).

Methods

Participants

Ten adults who were native Cantonese speakers were recruited at City University of Hong Kong and completed the training. They included 2 males and 8 females, who were 23.1 years old on average (SD = 4.50). Seven of them were regarded as musicians, as they were formally trained in music for 2–10 years, with the major instrument being piano (N = 5), violin (N = 1) and flute (N = 1). Three were regarded as non-musicians as they were not formally trained with music before. One additional participant dropped out in the middle of the training and was excluded from all analyses. All participants filled out a questionnaire about their music training background, including the musical instruments and the highest ABRSM exam passed, and reported if they regarded themselves as ‘AP possessors’. As expected, none considered themselves ‘AP possessors’ before training (otherwise joining the lengthy training would be pointless). In the test for generalization before training (see below), all participants produced an average pitch-naming error larger than 1 semitone with the training tones. A within-1-semitone error was one of the commonly used definitions for ‘AP possessors’ in the literature (e.g., Bermudez & Zatorre, 2009; Crozier, 1997; Loui et al., 2012), indicating that their pitch naming ability was not comparable with that of ‘AP possessors’ before training. They received monetary compensation for the training and testing. Table 1 shows the number of participants and the design for all experiments.

Table 1 Details of the training protocols in the three experiments

Full size table

The sample size was estimated based on a recent AP training study (Van Hedger et al., 2015) using GPower 3.1.9.2. In this study, a large effect size was observed for the training improvement in adults (pretest vs. posttest; f = 1.34). Using the same f, the sample size required to detect any training effect at p = .05 with a power of 0.95 was 5 participants. To be more conservative, we recruited 10 participants. This sample size was also consistent with that used in previous perceptual training studies (Chung & Truong, 2013; Y. K. Wong et al., 2011).

Materials & Stimuli

The experiment was conducted on personal computers using Matlab (Natick, MA) with the PsychToolbox extension (Brainard, 1997; Pelli, 1997) at the Cognition and Neuroscience Laboratory at City University of Hong Kong. Participants were requested to bring their own earphone to the training and testing. They adjusted the volume to a comfortable level before the training or testing started.

In Experiment 1, 120 tones were used. They were synthetic tones and piano tones in octaves three to six, and violin tones in octaves four to five (see Supplementary Figure 1 for their spectral envelop). The synthetic tones were identical to those in prior AP tests, and were generated by summing a series of sinusoidal waveforms including the fundamental frequency and harmonics (Bermudez, Lerch, Evans, & Zatorre, 2009). The piano tones were recorded with an electric keyboard (Yamaha S31). The violin tones were recorded by a volunteer violinist in a soundproof room. The precision of the tones was checked during recording by a tuner. The deviation between the actual frequency of the recorded tones and the expected frequency for the equal-tempered scale (with A4 = 440Hz) ranged from .012% to .765%, with a mean deviation of .231%, SD = .193.

The sound clips were 32-bit with a sampling rate of 44100Hz. They were edited in Audacity such that they lasted for 1 second with a 0.1-second linear onset and 0.1-second linear offset and were matched with similar perceptual magnitude by ear^{Footnote 2}.

Absolute pitch training

The training was gamified and structured into 80 levels, which was organized into ten 8-level parts with an increasing number of pitches (from 3 pitches in the first 8 levels to 12 pitches in the last 8 levels; Fig. 1A). Each eight-level part consisted of four types of levels, which included tones that were progressively richer in timbres (synthetic, or synthetic & piano) and octaves (4, or 4 & 5). Tones in different octaves and timbres were introduced gradually into the training so as to break down the learning steps into smaller ones and make the learning more achievable. The order of adding new tones, i.e., tones in new octaves were introduced before new timbres, was arbitrary.

Each of the four types of levels was repeated twice, once with trial-by-trial feedback provided, and then once without feedback (Fig. 1A). If participants achieved 90% accuracy for a certain level, they would proceed to the next level; otherwise they would stay at the same level (Fig. 2A). The training was completed by finishing 12 hours of training or by passing all 80 levels with 90% accuracy. Participants finished one hour of training per day. They were trained on at least four days per week and finished the training in three weeks.

For example, participants began the training with three pitches (E, F and F#). At levels 1–2, synthetic tones in these three pitches in octave 4 were included, with feedback provided at level 1 and then without feedback at level 2. At levels 3–4, synthetic tones in both octaves 4 and 5 were included with feedback and then without feedback. At levels 5–6, synthetic tones and piano tones in octave 4 were included with feedback and then without feedback. At levels 7–8, synthetic tones and piano tones in octaves 4 and 5 were included with feedback and then without feedback. At the no-feedback levels, participants were not provided with any feedback on the correctness of the tones. These no-feedback levels served as mini milestones for participants’ AP performance at 90% accuracy. If they achieved 90% accuracy at the 8^th level, a new pitch was added into the training set, with which they went through the same 8-level part again. Each level included 20 trials, with tones distributed as evenly as possible among the training pitches, octaves and timbres.

A pitch-naming task was used (Fig. 2A). During each trial, an isolated tone was presented for 1s. Then, an image showing the pitch-to-key mapping was presented. Participants were required to name the pitch of the presented tone by key press within 5s. Semitone errors were considered errors in the training. Before each level, participants were allowed to freely listen to sample tones of the training pitches as many times as they preferred before proceeding to the training (Figure 2A). In each one-hour training session, participants might have finished different numbers of training trials depending on their pace of learning (e.g., the amount of time spent on training trials or on sample tone listening).

Normally participants earned one point for a correct answer in each trial. To motivate participants, a special trial that was worth three points randomly appeared with a chance of 1/80. Also, participants were given 1, 2 and 3 tokens if they achieved 60%, 75% and 90% accuracy at a training level respectively. With ten tokens, participants would obtain a chance to initiate a three-point special trial when preferred. Three chances of initiating these special trials at one level were allowed at most. The special trials did not appear and could not be initiated during no-feedback levels. This ensured that participants performed the no-feedback levels without any scoring assistance.

Definition of AP

In order to compare the performance of trained AP with that of real-world ‘AP possessors’, we surveyed the literature of AP on Web of Science on 19^th April, 2017 with the term ‘absolute pitch’ in the topic and identified 110 empirical papers. Unfortunately, there was not a single objective definition of the performance level of ‘AP possessors’. Instead, these papers used highly varied definitions of ‘AP possessors’, including self-report, AP performance measurements, or relative performance on AP tasks between participants (such as 3 SDs higher in AP accuracy than ‘non-AP possessors’). We focused on the 66 publications that defined AP objectively based on AP performance instead of self-report, and found that these papers adopted highly varied performance measures to define ‘AP possessors’ with different cut-off points and scoring methods (e.g., taking semitone errors as correct, partially correct or incorrect; using accuracy or the average size of errors, etc.).

Given this variability in definition, we did not see any strong reasons to adopt any single definition of ‘AP possessors’ based on some particular publications. In this study, we arbitrarily adopted a relatively stringent training criterion of AP, i.e., having achieved 90% pitch naming accuracy with all of the 12 pitches and with semitone errors considered incorrect. We asked whether participants who have passed our training would be considered ‘AP possessors’ according to the definition specified in each of the 66 papers, and recalculated the participants’ performance if needed. Results showed that our successfully trained participants would be considered ‘AP possessors’ in 83.3% (55 out of 66) of those papers using objective definitions, meaning that their AP performance was representative of and comparable to that of ‘AP possessors’ as defined in the literature.

Test for generalization

The test for generalization was performed before and within three days after training to examine how well the pitch-naming abilities generalized to untrained octaves and timbres (Table 1). An identical test was performed a month later to examine whether the AP learning sustained for at least a month. One hundred and twenty tones in octaves 3 to 6 were used, in which octaves 4 and 5 were trained, and 3 and 6 untrained. Three timbres were included, with synthetic and piano tones as trained timbres, and violin tones as an untrained timbre. The tones were presented in three conditions, either with trained octave and timbre, trained octave and untrained timbre, or untrained octave and trained timbre.

A pitch naming task was used (Fig. 2B). During each trial, a tone was presented for 1s. Then an image that mapped the 12 pitch names to 12 keys of the keyboard, which was the same as those used in the training, was presented. Participants were required to name the pitch of the presented tone by key press within 5s. Each tone was presented twice, leading to 240 trials in total. The trials were presented in randomized order. Ten practice trials were provided before testing. The dependent measure was the precision of pitch naming, i.e., the average error in semitone of participants’ responses relative to the correct responses. A trial without any response will be assigned an error of 3 semitones (the expected error one would have by complete guessing^{Footnote 3}). The average percentage of no-response trials in the pretest, posttest and testing after one month was .066%, .027% and .27% respectively^{Footnote 4}. We used the average error instead of the general naming accuracy because measuring the size of judgment errors additionally informs the precision of pitch naming performance of the individuals, which is more informative than the binary correctness of the responses as measured by general naming accuracy.

Results and Discussion

Learning of AP

In general, participants made substantial progress in learning to name pitches (Fig. 3A; see individual progress of learning in Supplementary Fig. 2). At the end of training, they were able to name on average 8.1 pitches (out of 12; range = 5 to 12; SD = 2.28) at 90% accuracy without any externally provided reference tones under the stringent scoring criterion of taking semitone errors as incorrect.

Importantly, one of the ten participants passed all levels of training, meaning that he was able to name all of the twelve pitches at 90% accuracy without any externally provided reference tones. With this level of AP performance, he would be recognized as an ‘AP possessor’ in most of the empirical papers that adopted an objective performance-based definition of AP (see Methods). These suggest that he has acquired AP through perceptual learning in adulthood.

Generalization & sustainability of AP learning

The pitch-naming performance improved and was similar between trained and untrained tones (Figure 4A). A 2 x 3 ANOVA with Prepost (pretest / posttest) and Stimulus Type (octave & timbre trained / octave trained & timbre untrained / octave untrained & timbre trained) as factors on pitch naming error revealed a significant main effect of Prepost, F(1,9) = 30.3, p < .001, η_p² = .771, with a smaller pitch naming error at posttest than pretest. No other main effect or interaction was observed, ps > .15, i.e., we did not observe any difference between the naming performance of tones in trained or untrained timbres and octaves (η_p² of the main effect of Prepost was .846, .638, .485 for octave & timbre trained, octave trained & timbre untrained, octave untrained & timbre trained conditions respectively). These suggest that the AP learning generalized to untrained octaves and timbres.

To check if the improvement sustained for a month, a one-way ANOVA was performed with Prepost (pretest / posttest / a month later) on pitch naming error with the trained tones^{Footnote 5}. It revealed a significant main effect of Prepost, F(2,16) = 18.8, p < .001, η_p² = .701. Post-hoc Scheffé tests (p<.05) showed that the pitch naming error reduced after training and remained similar a month later, p > .73 (Supplementary Table 1; η_p² of the main effect of Prepost between the pretest and the immediate posttest was .840 and that between the pretest and the posttest a month later was .675). These suggest that the AP learning sustained for a month. Overall, the AP learning corresponded well with classic characteristics of perceptual learning in terms of performance enhancement, generalization and sustainability.

While we did not include any control group to control for the potential learning effect during repeated testing, the observed improvement during the test for generalization was unlikely explained by repeated testing per se. First, AP is notoriously famous for its difficulty to acquire in the literature. Second, the average progress per hour of training (where at least hundreds of trials were involved) was relatively slow (Fig. 3A). Third, the test-retest improvement was limited based on our pilot testing. Eleven participants repeated a similar AP task three times in the same session, with an optional break in between. The task was a pitch verification task, in which half of the trials showed the correct label of a presented tone, and the other half of the trials showed a label that was a semitone away from the correct label. Participants had to judge whether the label and the tone matched or not by keypress. Each task had 144 trials. Result showed that performance did not differ among the three attempts of the same task, F(1,20) = 1.94, p = .17 (mean d’ = .27, .42 and .30 and SD = .72, .84 and .73 for the three attempts respectively), suggesting that the test-retest improvement in AP judgment was limited. Therefore simply performing the test for generalization twice can unlikely cause the substantial and sustained AP improvement after training.

Experiment 2

Experiment 1 showed that it is feasible to acquire AP in adulthood, and the pattern of AP learning generally fits with that of perceptual learning. In Experiment 2, we aimed to replicate the feasibility of acquiring AP in adulthood and further characterize such AP learning. First, we tested the robustness of AP acquisition in adulthood by using a different set of training protocol, including a different set of training tones, training tasks, training duration and design (Table 1 and Fig. 1B). In particular, we used both a pitch verification task and a pitch naming task in the training. Compared with a naming task, the verification task had a reduced number of alternatives (ranged between 3 to 12 in a naming task) to 2 (yes / no) while keeping the perceptual demand of differentiating between tones that are one semitone apart. A smaller number of alternatives reduces the decisional demand, and thus makes the task easier (Y. K. Wong & Wong, 2014). This may further break down AP learning into smaller steps such that AP can be improved more smoothly. Given that many participants were still slowly improving towards the end of the training in Experiment 1 (Supplementary Figure 2), the training time of Experiment 2 was also extended to 15 hours so as to allow time for improvement.

Second, Experiment 1 showed that including training tones in multiple octaves and timbres resulted in generalized AP learning to untrained tones. This result is consistent with the predictions based on the psychological space of training and testing (Y. K. Wong et al., 2011; see also Nosofsky, 1986, 1987; Palmeri & Gauthier, 2004). In Experiment 2, we further tested the predictions of the psychological space by asking whether covering a more specific part of the psychological space, i.e., by using a smaller set tones in one octave and one timbre only, would lead to higher specificity in AP learning.

Third, we examined whether musicians benefit from the training more than non-musicians. The long-term musical training in musicians have fine-tuned the perceptual representations of tones in the auditory system (Pantev et al., 1998), which may in turn help one establish the association between the tones and the pitch names, i.e., learning AP.