EmoTan: enhanced flashcards for second language vocabulary learning with emotional binaural narration
- 71 Downloads
We report on the design and evaluation of a flashcard application, enhanced with emotional binaural narration to support second language (L2) vocabulary learning. Typically, the voice narration used in English vocabulary learning is recorded by native speakers with a standard accent to ensure accurate pronunciation and clarity. However, a clear but monotonous narration may not always aid learners in retaining new vocabulary items in their semantic memory. As such, enhancing textual flashcards with emotional narration in the learner’s native language can foster the retention of new L2 words in episodic rather than semantic memory as greater emotive expression reinforces episodic memory retention. We evaluated the effects of binaural emotive narration with traditional textual flashcards on L2 word retention (immediate and delayed) in laboratory experiments with native Japanese-speaking English learners. Our results suggest that the learners were able to retain approximately 60% more L2 words long-term with the proposed approach compared to traditional flashcards.
KeywordsEmotion Narration Binaural recording Flashcard Computer-assisted language learning Vocabulary learning
- Not applicable.
Vocabulary is fundamental to English language learning as its lack impedes language-related activities, such as conversing and multi-media content comprehension (Nation 2006; van Zeeland and Schmitt 2013). To increase vocabulary knowledge, a learner has to encounter the language repeatedly in daily micro-times, and hence, mobile language learning devices are becoming important. The number of mobile language learning studies has been increasing annually (Hwang and Fu 2019), and it has been proved that these applications can enable the learner to improve their language skills (Hwang and Wu 2014). To learn new English words with these mobile devices, learners generally memorize words while listening to the pronunciation in online dictionaries (Weblio 2005; Goo dictionary 1999; Jayme Adelson-Goldstein 2015; American Heritage Dictionary. Online dictionary 1969) or flashcard applications (mikan 2014; Smart Language Apps Limited 2015). The pronunciations recorded by native English speakers have no specific accent and, therefore, are accurate and easy to comprehend. Furthermore, although the vocalization is adequate for a learner’s verification of his/her pronunciation of already-known words, it is not sufficient for understanding and memorizing new words (Wright et al. 2013).
To redesign vocalization for introducing new words, some commercial off-the-shelf language learning applications have been released. Moetan is a pioneering work (Moetan 2008) that alters the vocalization to the voice of a game character. Released for smartphones and gaming devices, this application allows learners to learn a word through sample sentences and voice. In addition, other stock voice applications that continuously use professional voice actors have also been released (Hiroshi 2014; Ogura H. 2014). It is suggested that these game-based type language applications are useful for maintaining learning motivation or engagement (Hung et al. 2018). However, these applications use monaural voice and are an attempt at simply alternating the voice actor vocalizing the flashcard. Moreover, it is not enough for understanding the meaning of the target words or retaining the meaning of the word in long-term memory.
We designed narrations to memorize English words in the emotionally rich episodic memory and not semantic memory, and we described the method of producing the narration.
The effectiveness of the narration using flashcard application was evaluated by comparing our method to the traditional pronunciations of English words. The result implies that learning by the proposed narration method makes it significantly easier to remember an English word and its corresponding translation. However, note that it is not yet clear which variable of this narration affects this result in the present study.
We show that our proposed method could aid in retaining more English words into long-term memory than traditional pronunciation by applying our method to a general English learning style, in which the learners freely memorize words using a typical flashcard application on a tablet until they are satisfied.
Memory-enhancing effect of emotion
Research in cognitive and affective neuroscience has shown that emotive expression enhances the long-term retention of human memory (McGaugh 2003). For instance, Kensinger and Corkin demonstrated that emotive words (e.g., funny, victim, error) are more readily recalled than non-emotive words (e.g., switch, locate, habit) (Kensinger and Corkin 2003). For a paired-associate learning format, Kleinsmith and Kaplan showed that nonsense-syllable-paired associations learned under a high arousal state produced nearly permanent memorization (Kleinsmith and Kaplan 1964). Furthermore, the long-term memorization of a word is likely to be retained merely by placing a neutral word that does not have an emotional meaning into an emotional context (Phelps et al. 1997).
An emotional state can be identified under a circumplex model of affect (emotion), with the horizontal axis representing the valence dimension and the vertical axis representing the arousal dimension (Russell 1980). The arousal dimension is regarded as key to the memory-enhancing emotion effect. Hamann et al. (1999) showed images to subjects to evoke various emotions. High-arousal images were memorized, while valence ones were not. The researchers conducted experiments, showing images to subjects to evoke various emotions, and demonstrated that arousal pictures were better remembered than valance (pleasant or unpleasant) ones. They also demonstrated that pictures with a high galvanic skin response, indicative of high arousal level, were better memorized than others (Hamann et al. 1999).
The hippocampus and peripheral limbic system are believed to be involved in the promotion of long-term memory by means of emotions (Hamann et al. 1999; LaBar and Cabeza 2006). Particularly, the secretion of noradrenaline in the amygdala acts on the most recently engaged synapse of the brain and is involved in the consolidation of all types of learning information (Hamann et al. 1999; LaBar and Cabeza 2006). Noradrenaline in the amygdala is secreted by stimulation due to excitement and stress. Therefore, for example, if a memory test is performed during mild exercise, such as grasping a force meter, the arousal level will increase and the memory may be reinforced (Coles and Tomporowski 2008).
To apply the memory retention effect of emotion to vocabulary learning, it is better to use this retention effect not only for emotive words, such as in the study of Kensinger and Corkin (2003), but also for non-emotive words. For that, it is necessary to stimulate human emotions using external emotional stimulation while learning non-emotive words. International Affective Digitized Sounds (IADS) (Bradley and Lang 2007) and the International Affective Picture System (IAPS) (Lang et al. 2008) have been widely used as emotional stimuli in cognitive and affective neuroscience studies. Nevertheless, an insufficient number of sounds or sound effects exist compared to the size of the English vocabulary, making it difficult to find enough emotional stimuli that match the meaning of English words. Furthermore, since it is a pure experimental stimulus, learning becomes experimental, so it is not practical in assuming a case of daily vocabulary learning.
The major technique to evoke emotions with voice and sound are 3D sound attraction and Autonomous Sensory Meridian Response (ASMR). These techniques create emotive specific situations or stories by using binaural recording technology. Binaural recording is a method of creating a 3D stereo sound sensation for the listener which is similar to actually being in the room with the performers. This effect is created using a dummy head mannequin that is outfitted with a microphone in each ear. This has been applied not only for amusement parks (3D sound attraction of Joypolis 2016), but also for controlling moods and chronic pain (Barratt and Davis 2015). However, there has been no methodology introduced yet for applying this binaural emotional technology to English vocabulary learning.
Experience-based vocabulary learning
To acquire vocabulary in long-term memory, it is important to deeply understand the meaning of a word and repetitively learn the word to memorize it. To promote word understanding, there is a system that uses pictures (Jayme Adelson-Goldstein 2015) and sentences (Suzuki 2000). However, to promote a more efficient and effective use of vocabulary learning, learning within a brief time spaced throughout the day using mobile devices (micro-learning) has been studied (Cavus and Ibrahim 2009; Gassler et al. 2004). Additionally, there are mobile applications that guide the appropriate review timing based on the Ebbinghaus forgetting curve (Nakata 2015; Luis and von Severin 2011). Nonetheless, executing several repetitions remains necessary.
To mitigate mechanical repetition in vocabulary learning, learning as an experience or episodic memorization has also been actively studied (Nessel and Dixon 2008; WEARABLE LANGUAGE TEACHER ELI 2017). Episodic memory is the memory of an individual’s experiences. It is indexed in a time-space context, such as a location and the surrounding environment. On the other hand, semantic memories are defined as memorized knowledge, where the temporal and spatial information in the learning no longer exist. English Learning Intelligence (ELI) (WEARABLE LANGUAGE TEACHER ELI 2017) is a learning method that involves compiling a diary in English, that is, a written record of daily activities and conversations, which involves the construction of English sentences. Additionally, a vocabulary learning application that uses context information for memorization was developed (Al-Mekhlafi et al. 2009; Dearman and Truong 2012; Hsieh et al. 2007; Ogata and Yano 2004). However, the vocabulary is heavily dependent on the context. To cope with this limitation, the use of multimedia content, such as movies, videos, and television, has been studied. ViVo, for example, is a dictionary that extracts short video clips from movies and television based on keywords from subtitles and combines them with images for learning (Zhu et al. 2017).
It is thus becoming possible to search for the usage of a target word that the learner wants to remember from daily life and multimedia contents as example sentences (WEARABLE LANGUAGE TEACHER ELI 2017; Al-Mekhlafi et al. 2009; Dearman and Truong 2012; Hsieh et al. 2007; Ogata and Yano 2004) or videos (Zhu et al. 2017). Although these sentences include the target word, they were not designed to represent the meaning of the target word. Therefore, it is unclear whether they are sufficient for understanding and memorizing new words. Furthermore, the audio displays for presenting sentences in previous works were seldom in stereophonic sound using the binaural recording technique and were not designed to evoke emotions in learners from the experience. In this paper, we describe a method of producing the audio contents that represent the meaning of the English words and evoke emotion at the same time.
To incorporate emotional stimulation into English vocabulary learning, we focus on the pronunciation used in online dictionaries and flashcard applications on mobile phones. While it may be possible to use international standard emotional sounds, such as IADS (Bradley and Lang 2007), there are limitations on the types of arousal effect sounds. It is thus difficult to respond to the various types of words. For that reason, we originally created the narration of the arousal emotion corresponding to the word.
When an emotional stimulus is presented to a person, the experience episode is strongly memorized. Therefore, it is desirable that the contents of emotional stimuli reflect both the basic and fine meanings of words. Additionally, if one episode is close to another episode in time, memory may interfere (Loftus 1996). Therefore, we attempted to create different scenes and stories for each word. Furthermore, the content needed to represent an emotional arousal stimulus.
Selection of narrator;
Selection of words;
Creation of expressions and scripts;
To create impressive emotional narrations, expressive voice actors were adopted as narrators under the supervision of English native speakers (supervisors) with English teaching experience. To select a narrator, we first conducted an audition based on three criteria: accuracy of pronunciation, ease of being heard, and expressive voice. The narration used for the audition was a sample submitted by each voice acting office; accordingly, each voice actor spoke various lines in English. Specifically, an experiment was conducted by the supervisor as follows. We asked five female and three male candidates from the voice acting offices of four companies to participate in the audition. The supervisor listened to the narration sample and answered a questionnaire. Based on the responses, we chose the voice actor with the highest average score.
First, we selected 1000 words other than the most frequent 9000 word families of the British National Corpus to eliminate from the experiment the influence of prior knowledge. Next, we selected only words of four, five, and six letters (approximately 370 words) since experiments in related works were performed using these word lengths (Nakata 2015; Webb 2007). Additionally, considering the compatibility with our method, we selected words having an action or sound meaning, which are known to influence the user psychology (approximately 100 words). According to their meanings, 100 words were grouped into ten categories: person, emotion, contact, communication, food, object, social, environment, impression, and sound effect. These categories were defined by referencing the 25 nouns and 15 verbs at the top of the Princeton WordNet hierarchy. We selected approximately three words from each category (30 wordsin total).
To maximize the memory trace in episodic memory, we established specific situation parameters of “when,” “where,” “who,” and “how.” However, it seemed difficult to sufficiently express the nuance of the meanings of words only by emotionalizing the pronunciation. We therefore added several seconds of a short story that expressed the meaning of the given word in its context.
Three types of arousal emotions were considered: pleasant arousal expression, unpleasant arousal expression, and neutral arousal expression. Based on these emotions, we brainstormed the representation of stories in collaboration with a content production company (InfoBurn Co., Ltd.). The categorized expressions were obtained and grouped as follows: (1) one person, expressions in which a voice actor acts alone; (2) two persons, expressions of actions in with two or more voice actors; (3) approach, expressions in which the voice actor approaches the listener; (4) touch, expressions in which the voice actor touches the listener; (5) circulate, expressions in which the voice actor circulates around the listener; and (6) together, expressions experienced with voice actors. Note that in case of (2) two persons, the voice actor plays two roles. As she is a professional voice actor, she can perform as characters with different voice tones. The expressions were selected to ensure a balance between these types. The script was determined based on the above points.
List of recorded narrations
Words (recorded length in seconds)
1. One person
Duress (7), peeve (13), barrow (9), feline (6), swig (7)
2. Two persons
Bawl (11), throes (7), bungle (10), schism (7)
Renege (8), pucker (7), aerate (9), cajole (8), navel (8), prank (3), suckle (7)
Maraud (8), pummel (6), rumple (7)
Loony (14), hornet (7), heckle (8), fondle (9)
Bemoan (11), honk (6), fondle (9), snoop (9), grotto (9), bovine (11), mirth (10), rapt (10)
Prototype flashcard application
The application was implemented with an iOS application (Flashcards Deluxe). The user can save words and narrations on his/her personal computer in advance and input correspondences between words and sounds using Excel. By synchronizing it with Dropbox, words and sounds are displayed on the tablet in a flashcard format. The user can thus learn the next word by swiping forward.
Experiment 1: Memorize under a controlled environment
To assess if the proposed method facilitates the memory retention of English words, we compared the number of words forgotten using the previous method (baseline) and that of the proposed method (proposed) (RQ1). The participants memorized a pair of words, that is, an English term and its Japanese translation by two methods. Both immediately and 1 week later, the user retrieved the memorized words in a memory test. To check the emotional state during memorization, we measured the SCR and compared it across the two methods (RQ2). The memorization method of English words may differ by each participant; hence, memorization time and the times a sound played are controlled in this experiment.
Since we are preparing to release this learning application to the public, we need to define the main target audience for the application. Since we plan to advertise through the voice actor’s Twitter account, we decided to match the demographics of the experiment participants with the demographics of the voice actor’s Twitter fans. Twitter users who tweeted the voice actor’s name were approximately 60% male, and approximately 70% were in their 20s and 30s (using Mieruka analysis tool, Plus Alpha Consulting Co., Ltd.). Thus, we recruited participants who were men in their 20s and 30s.
We conducted experiments with 30 participants. They were hired and compensated through a temporary agency. The participants were all male native Japanese speakers ranging from 20 to 39 years old (M = 27.6, SD = 6.1). Fifteen of the participants had good English knowledge, with Test of English for International Communication (TOEIC) scores above 730 points (A or B level on the TOEIC proficiency scale). The remaining subjects were native Japanese speakers from a relatively lower socioeconomic bracket, who self-declared that they were not adept at memorizing English words. Two out of the 30 participants were excluded from the result as they reviewed the words after the test on the first day.
Vocabulary and materials
To know the tendency of forgetting rate of the word that is unknown to the participants, we selected ten nouns (loony, mirth, navel, prank, barrow, grotto, throes, bovine, hornet, honk) and ten verbs (bawl, maraud, pummel, aerate, rumple, heckle, cajole, swig, suckle, snoop), as shown in Table 1. These words were not included in the most frequently occurring 9000 words of the British National Corpus, with which the most Japanese may not learn this vocabulary level in his/her education; thus, participants may have been familiar. The average number of characters was 5.5 (four letters: three words; five letters: five words; six letters: twelve words), as experiments in a previous study which verify memory retention effect of learned words used ten target words of five or six letters. We used the Japanese translation of the words by the Weblio dictionary (Weblio 2005).
The average length of the narration of the proposed was 8 s (SD = 2 s). The sound of the baseline utilized the voice of an online dictionary (Goo(Goo dictionary 1999); Nippon Telegraph and Telephone resonant). The length of the voice recording was 0.8 s on average (SD = 0.2 s). In comparing forehand the quality of the sound for the four online dictionaries (Goo (Goo dictionary 1999), American Heritage (American Heritage Dictionary. Online dictionary 1969), Weblio (Weblio 2005), and Oxford Advanced Learner’s (Oxford Learner’s Dictionaries 1948)), Goo had the sound quality closest to the proposed. The sound quality of the proposed was based on a 44.1 kHz sampling rate, 24 bit depth, and two channels, whereas that of the existing method was based on a 44.1 kHz sampling rate, 32 bit depth, and two channels. Note that the experimenter adjusted the sound volume in advance using the tablet setting to mitigate the difference in the volume of both conditions.
Experimental design and procedure
The learning time per word was set to 30 s; hence, the total time of each memorization (previous and proposed) was 300 s (30 s ×10 words). The sound was played only one time, when the word was displayed on the tablet screen.
To understand prior knowledge of a word, a test was first conducted. It consisted of a productive and a receptive test. The productive test enabled the participants to enter the corresponding English term after the displayed Japanese term, while the receptive test was the reverse. However, in the productive test, to prevent the participants from answering with another semantically similar words, we showed several letters of the term and the correct answer for another letter as a hint. We tested the knowledge of all 20 words in random order.
To become accustomed with both methods, the participants practice them each with six words. The six words did not include words from the actual session. The first three words were learned by the baseline method; the next three words were learned by the proposed method. We requested the following of the participants: “Please actually remember English words with two kinds of voices: one is a normal voice, and the other is a voice actor’s voice. The voice actor’s narration is comprised of approximately 8 s of audio content for you to experience the meaning of the word. To control the approach for remembering the meaning, please do not use memorization techniques, such as the method of loci, equivoque, and other memory strategies. There is a learning time of 30 s for each word. Please remember it while repeating it.”
One week after this experiment, only the memory test was conducted again. We subtracted the number of correct answers after one week from the number of correct answers immediately after the memorization of English words for the “forgetting rate.” We then compared the number of forgotten items by the baseline and proposed methods. The procedure of the memory test was the same as in the previous stage.
The experiment was conducted over 2 days. The time required for the first day was 28 min and that required for the second day was 15 min. We did not explain to the participants that the same English word memory test would be carried out on the second day, we only explained that the same experiment would be carried out on the second day. This prevented the participants from perceiving the intention of the experiment and reviewing the memorized content between the experiment of the first day and the second day test.
Psychological response measuring and words scoring
During the experiment, electrodes were attached to the participants’ palms to measure the SCR, which was calculated as the difference between the maximum and minimum values during the playing of a sound (Fig. 5 (left)). At the beginning of the experiment, the experimenter instructed the participants not to move their bodies while memorizing to ensure that no SCR artifact was included in the measurement data.
To score the responses, a sensitive scoring protocol was adopted (Barcroft 2002). In this method, if one character was correct, or if there were correct characters of 25% or more and less than 50% (the position may not be correct), it was set to 0.25 points. Similarly, if 25% or more and less than 50% of letters were correct, or if there were 50% or more and less than 75% of correct characters, it was set to 0.5 points. When 50% or more and less than 100% of characters were correct, or when an extra letter was added, it was set to 0.75 points.
Results and discussion
The results of the Mann–Whitney test
To compare the forgetting rate between the baseline and proposed methods, their respective scores were subtracted from one another for each subject (Fig. 6 (right)). To validate data normality, we used the Shapiro-Wilk normality test, which indicated that data for both the baseline (p =.80) and the proposal (p =.31) were normally distributed, and we used a paired two-tailed t test for the statistical analyses. The participants’ forgetting rate score for proposed method (M = 0.28, SD = 0.18) was significantly lower than that of the baseline method (M = 0.35, SD = 0.19): t = 2.36, p <.05, d = 0.41. This suggested that the participants were able to memorize and retain more English language words with the proposed method than the baseline method (RQ2).
According to a feedback form completed after the delayed test, many positive comments were communicated by the participants: “The retention rate is remarkably different between the actor’s voice and the conventional voice”; “I remember the story from the beginning to the end, and I can reproduce the story”; “The actor’s voice is remarkably impressive compared to the conventional one, but it might be difficult to completely memorize English word pairs with one memorization”; “I would like to use it for learning English words if the application is released”; “I remembered vividly that the sound source moved from the left to the right of the headphones, and I also remember I felt a chilling sensation at this point.”
Although our results suggested that the forgetting rate with the proposed method was lower than that with the conventional learning method, there remain some problems that must be addressed in future works. For example, the delayed test score only increased from about 20% for the baseline to about 30% for the proposed method. Most participants responded that the story was clearly remembered. On the other hand, there were some cases in which the Japanese recording was erroneous. For example, the word “loony” was recorded by the voice actor as a “crazy person” circling around the listener. Some subjects associated this meaning with “fear” based on the atmosphere in the recording. Additionally, the word “honk” contained a sound effect of a car approaching while a horn was sounded, and the voice of the actor cried “danger.” One participant remembered the scream rather than the sound of the horn, and thus associated the meaning with “shout.” It seems that a term that has multiple Japanese translations from a story is sometimes remembered in English as having a different meaning.
Additionally, although stories of English words and Japanese translations are easy to memorize, the connection with the spelling is often forgotten. Further, as mentioned by a participant, “Although the story and Japanese translation remain, the connection with English words often disappeared.” “For the word ‘cajole’, the impression of the story was too strong, and the memory of the spelling disappeared.” “I could remember the story, but I could not recall which word the story was associated with; if I heard it again I could learn it again and be able to make a firm connection.”
Many learning experiences often use output and reflection to change the experience into knowledge (WEARABLE LANGUAGE TEACHER ELI 2017). However, in this experiment, the narration was listened to only one time with no opportunity to repeat it, making it somewhat different from the general process of experience learning. Therefore, it can be considered that a sufficient learning effect was not observed.
Experiment 2: Memorize under a freeform learning environment
Points of modification from experiment 1
Points of modification
Learning time per word
Can return or not
Experimental design and procedure
We conducted the experiment with 30 participants. Two of 30 participants were excluded from the results because we could not obtain correct answers from them due to print errors on the test sheet. The remaining 28 participants were all male, native-Japanese speakers from 20 to 37 years old (M = 25.7, SD = 4.7) who were not included in Experiment 1. Similar to experiment 1, all participants were men in their 20s and 30s (see “Participants” subsection in the “Experiment 1: Memorize under a controlled environment" section). All participants were either university students or graduates. Their TOEIC scores were below 860 points. The words and sounds were the same as in experiment 1.
The participants first learned ten words by the baseline approach and then performed an immediate review for converting the experience into knowledge. After that, they performed an immediate retrieval test. The next ten words were learned using the proposed method. To eliminate the influence of the order effect, the word memorization order was reversed between the baseline and proposed methods. To eliminate the influence of word memorability, the words were assigned in random order. Additionally, we set a pre-test phase for understanding the prior knowledge of the participants and a practice phase for eliminating the influence of the practice effect.
Results and discussion
To compare the forgetting rate under the baseline and proposed methods, their respective scores were subtracted from one another for each subject (Fig. 9 (right)). To validate data normality, we used the Shapiro-Wilk normality test, which indicated that data for both the baseline (p =.98) and the proposal (p =.99) were normally distributed, and we used a paired two-tailed t test for the statistical analyses. The participants’ forgetting rate score for proposed method (M = 0.42, SD = 0.22) was significantly lower than that of the baseline method (M = 0.59, SD = 0.18): t = 5.28, p <.05, d = 0.86. This suggested that the participants were able to memorize and retain more English language words with the proposed method than the baseline method in the freeform learning environment.
We observed various participant behaviors during the experiment. Some participants memorized the words by continuously tapping the sound button while others by spelling the words with their finger on a table surface. Furthermore, after memorizing ten words, the participants usually checked their understanding by occluding the Japanese answer by hand. We believe that through this process, their experiences were converted to vocabulary knowledge. Consequently, the average review time for the ten English words was short: 63 s.
In the feedback form completed after the delayed test, many positive comments were communicated by the participants: “The voice with a story was easier for imagining the words than the normal one, thus the words in the actor’s voice remained in memory”; “A stimulating story is easier to remember”; “In case of the actor’s voice, I felt that the voice gives me a chance to remember even if I forget the word”; “Without a narration, I have to learn continuously. With a narration, it was fairly easy to remember, and learning efficiency was good.”
Although the number of forgotten words decreased compared to experiment 1, some participants still forgot the spelling of the English words similar to experiment 1. The memory enhancement effect by emotional stimulation does not promote all the sensory information experienced, but may affect inhibition (arousal-biased competition theory) (Mather and Sutherland 2011). This suppression involves temporal (Strange et al. 2003) and spatial suppression (Kensinger et al. 2007), which means that the memory of the sensory information before and after the emotional stimulation and the surrounding information of the emotional stimulus are inhibited.
Additionally, attention is related to this inhibition and the spatiotemporal peripheral information is related to attention, which tends to not be remembered. Therefore, a scene where the emotion of the story is high expresses the meaning of the word, but it is still necessary to embed the information on spelling in that scene. Further, the attention may be targeting the auditory information, meaning the visual information presenting the spelling is suppressed. It is thus necessary to consider a visual presentation that calls attention to the spelling as well.
Time length in seconds of the memorization phase in experiments 1 and 2
Conclusion and future work
In this paper, we proposed a voice-enhanced emotional flashcard application for mobile phones through which a learner can perceive the meaning of English words. Emotional binaural voice narrations were used to enhance L2 vocabulary learning. Comparing the memory enhancement effects of the voice in the proposed method with a typical voice in the baseline approach, it was found that learning by the proposed voice makes it significantly easier to remember the English word and its translation. However, the spelling was still not memorized. We believe this relates to two aspects: content design theory and arousal-biased competition theory. In future works, we will employ audio content design theory and an attention induction method to reinforce memory retention. Additionally, we will expand the content of the proposed method. Furthermore, we will evaluate this method in an actual learning situation.
In addition, we conducted two experiments with only men in their 20s and 30s as they were identified as the main target user group for our application. However, expanding the participant demographics and generalizing the results to women is also important and should be addressed in future works. The study of IADS (Bradley and Lang 2007) suggested that emotional reactions induced by a sound differ between males and females. Therefore, our results reported here may not be generalizable to women. However, when women experienced this application at a closed event, none of them had a negative response to the voices or applications. Moreover, similar to the experiment participants, they showed emotional reactions of arousal. Therefore, we are planning a controlled experiment with only female participants as a future work.
Finally, this paper suggests that emotive story-based binaural narration promotes the memory retention of English words. However, it is not clear which element of this narration contributed to the results. For example, the two experiments conducted in this study include two independent variables, namely an emotive narration versus a non-emotive narration and a story-based narration versus a non-story-based narration. As such, further investigation to better understand how individual factors affect memory retention is needed.
We would like to thank M. Kimura and T. Wakamiya for their suggestions with respect to language education. We also extend our thanks to S. Oguni and H. Takumi for their cooperation during the brainstorming and recording process. Finally, we are grateful to A. Hautasaari for carefully proofreading the manuscript. This research was (partially) supported by JST PRESTO (Grant No. JPMJPR1658).
SF contributed to all aspect of this manuscript: conception and design of the study, analysis and interpretation of data, collection and assembly of data, drafting of the article, critical revision of the article for important intellectual content, and final approval of the article. The author read and approved the final manuscript.;
I was born in 1986 and received a Ph.D. in engineering from the University of Electro-Communications in 2013. I was a visiting student of Camera Culture Group at MIT Media Lab supported by “Japan Society for Promotion of Science (JSPS) Research Fellowships for Research Abroad,” and a project researcher of Graduate School of Information Science and Technology at the University of Tokyo. I am currently an Assistant Professor of Graduate School of Information Science and Technology at the University of Tokyo and a researcher at Japan Science and Technology Agency (JST) PRESTO. My research interests are IA (Intelligence amplification), virtual reality, entertainment computing, and human emotions.
This research was (partially) supported by JST PRESTO (Grant No. JPMJPR1658).
- Hwang, G.-J., & Wu, P.-H. (2014). Applications, impacts and trends of mobile technology-enhanced learning: a review of 2008-2012 publications in selected SSCI journals. International Journal of Mobile Learning and Organisation, 8(2), 83–95. https://doi.org/10.1504/IJMLO.2014.062346. PMID: 62346. https://www.inderscienceonline.com/doi/abs/10.1504/IJMLO.2014.062346.CrossRefGoogle Scholar
- Weblio. (2005). Online dictionary: Weblio, Inc. http://www.weblio.jp/.
- Goo dictionary (1999). Online dictionary. NTT Resonant Incorporated. https://dictionary.goo.ne.jp/.
- Jayme Adelson-Goldstein, N.S. (2015). Oxford Picture Dictionary Monolingual (American English) Dictionary for Teenage and Adult Students (Oxford Picture Dictionary Second Edition). Oxford: Oxford University Press.Google Scholar
- American Heritage Dictionary. Online dictionary (1969). Houghton Mifflin. https://ahdictionary.com/.
- mikan. (2014). English learning application: mikan Co., Ltd. http://mikan.link/.
- Smart Language Apps Limited. (2015). Learn English (US) Flashcards, English learning application: Smart Language Apps Limited. https://itunes.apple.com/us/app/learn-english-usflashcards/id970002864?mt=8.
- Wright, S., Fugett, A., Caputa, F. (2013). Using e-readers and internet resources to support comprehension. Educational Technology & Society, 16(1), 367–379.Google Scholar
- Moetan, DS. (2008). FACTORY Co.: IDEA, Ltd. http://www.ink-chan.com/others.html.
- Hiroshi, O. (2014). Moesta Moerutodaieigojyuku: NOISE FACTORY co.,ltd. https://web.archive.org/web/20080730151146/http://moe-sta.jp/.
- Ogura H. (2014). Maruoboe Eitango 2600: KADOKAWA CORPORATION.Google Scholar
- McGaugh, J.L. (2003). Memory and emotion: the making of lasting memories. New York City: Columbia University Press.Google Scholar
- Bradley, M.M., & Lang, P.J. (2007). The International Affective Digitized Sounds (; IADS-2): affective ratings of sounds and instruction manual. Tech. Rep. B-3.Google Scholar
- Lang, M.M.B., Peter, J., Cuthbert, B.N. (2008). International Affective Picture System (IAPS): affective ratings of pictures and instruction manual. Technical Report A-8. University of Florida.Google Scholar
- 3D sound attraction of Joypolis (2016). CA Sega Joypolis Ltd. Retrievedfromhttp://tokyo-joypolis.com/language/english/attraction/3rd/hozuki.html.
- Suzuki, Y. (2000). DUO 3.0. Tokyo: ICP Inc.Google Scholar
- Gassler, G., Hug, T., Glahn, C. (2004). Integrated Micro Learning - an outline of the basic method and first results. Interactive Computer Aided Learning, 1–7.Google Scholar
- Luis, A., & von Severin, H. (2011). Duolingo. Retrievedfromhttps://www.duolingo.com/.
- Nessel, D.D., & Dixon, C.N. (2008). Using the language experience approach with English language learners : strategies for engaging students and developing literacy, (p. 171). Thousand Oaks: Corwin Press.Google Scholar
- WEARABLE LANGUAGE TEACHER ELI (2017). monom, 1-10 Group, and HAKUHODO PRODUCT’S. Retrievedfromhttp://eli-talk.com/en/.
- Al-Mekhlafi, K., Hu, X., Zheng, Z. (2009). An approach to context-aware mobile Chinese language learning for foreign students. In 2009 Eighth International Conference on Mobile Business. https://doi.org/10.1109/ICMB.2009.65, (pp. 340–346).
- Dearman, D., & Truong, K. (2012). Evaluating the implicit acquisition of second language vocabulary using a live wallpaper. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/2207676.2208598, (pp. 1391–1400).
- Hsieh, H., Chen, C., Hong, C. (2007). Context-aware ubiquitous English learning in a campus environment. In Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007). https://doi.org/10.1109/ICALT.2007.106, (pp. 351–353).
- Ogata, H., & Yano, Y. (2004). Context-aware support for computer-supported ubiquitous learning. In The 2nd IEEE International Workshop on Wireless and Mobile Technologies in Education, 2004. Proceedings. https://doi.org/10.1109/WMTE.2004.1281330, (pp. 27–34).
- Zhu, Y., Wang, Y., Yu, C., Shi, S., Zhang, Y., He, S., Zhao, P., Ma, X., Shi, Y. (2017). ViVo: Video-Augmented Dictionary for Vocabulary Learning. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI ’17. https://doi.org/10.1145/3025453.3025779. http://dl.acm.org/citation.cfm?doid=3025453.3025779. ACM Press, New York, (pp. 5568–5579).Google Scholar
- Loftus, E.F. (1996). Eyewitness Testimony. Oxford: Oxford University Press.Google Scholar
- Oxford Learner’s Dictionaries. (1948). Online dictionary: Oxford University Press. Retrievedfromhttp://www.oxfordlearnersdictionaries.com/.
- Strange, B.A., Hurlemann, R., Dolan, R.J. (2003). An emotion-induced retrograde amnesia in humans is amygdala- and beta-adrenergic-dependent,. Proceedings of the National Academy of Sciences of the United States of America, 100(23), 13626–31. https://doi.org/10.1073/pnas.1635116100.CrossRefGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.