Using Eye-Tracking Technology to Explore the Impact of Instructional Multimedia on CFL Learners’ Chinese Character Recognition


This study investigated differences in Chinese as a Foreign Language (CFL) students’ learning achievement, cognitive load level, and eye-movement patterns when learning three types of Chinese characters: pictographs, compound ideographs, and phono-semantic compounds with multimedia slides. Thirty CFL students participated in this study. Additional eye-tracking data, such as participants’ total fixation duration, mean fixation duration, and fixation count while viewing instructional slides, were recorded. The results showed that CFL learners learned pictograph characters with less cognitive load, compared with that involved in learning compound ideograph and phono-semantic compound characters. Participants were found to fixate more on the character area when inspecting the pictograph than on the other two categories of characters. Meanwhile, fewer fixations were placed on the enlarged character area when text passages served as the major resource for understanding compound ideograph and phono-semantic compound characters. The illustrative nature of pictograph characters seemed to make this character form more explicit than the other categories of Chinese characters. However, using only text information to understand abstract meanings of compound ideographs and phono-semantic compounds may place an additional cognitive demand on CFL learners, resulting in learners spending less time inspecting the character shape. Replacing the text description with videos or animation may provide scenarios in which ambiguity in the meaning of compound ideographs and phono-semantic compounds can be removed, to promote better understanding of these characters for CFL learners. Future studies are recommended to determine if the use of voice-over and videos/animation explanations improve the ability to learn Chinese characters.


Mandarin Chinese is currently used by almost one quarter of the world’s population, mainly in Asia. In addition to China, people living in Taiwan and Hong Kong also use Chinese as their official language. An increasing number of people whose native language is not Chinese have chosen to learn Mandarin Chinese as a second language. Although Chinese as a Foreign Language (CFL) is drawing attention worldwide, learners whose native language is not Chinese prefer to learn the language without learning the characters (Chuang and Ku 2011). Many studies have demonstrated that Chinese character recognition (CCR) is one of the most challenging aspects of learning the Chinese language (Du et al. 2015; Shi et al. 2003). Learners whose native languages are not Chinese commonly apply methods to learn Chinese that are similar to those they used to learn their own native languages (Cook 2003; Jiang 2008). Because Chinese orthography is logographic and composed of radicals in two-dimensional squares, learners whose first languages are based on alphabetic writing systems are likely to have difficulty learning Chinese writing and pronunciation (Shen 2004). In a study by Ye (2011), learners were unable to identify accurately the pronunciation of characters by visually examining the characters. In other words, CFL learners have difficulty resolving the connection between phonetics and semantics. As a result, CFL learners often find it difficult to learn Chinese.

Mayer (2001) proposed the Cognitive Theory of Multimedia Learning (CTML) to provide guidelines for effective multimedia learning designs, based on the Cognitive Load Theory proposed by Sweller (1994). Many empirical studies have suggested that traditional instruction can be transformed into multimedia forms to improve learning (Schnotz and Kürschner 2007). As a result, computer-based CCR materials have been developed over the past decade (Mason and Zhang 2017; Wen 2018); however, the learning effectiveness of these materials depends on the presentation format (Chen and Liu 2008). Although multimedia learning studies have shown that using multimedia improves learning outcomes, few studies have examined in-depth how CFL learners process multiple modalities to understand Chinese characters.

Eye-tracking technology has been utilized as one of several physiological means to realize human information processing (Underwood and Radach 1998). Eye movements have been used as the data source by researchers to explore learners’ reading processes (Rayner 1998). Recently, in multimedia studies, the use of eye-tracking technology has become more popular in exploring how multimedia learners process information encoded in different formats (Hyönä 2010). Although there are studies that used eye-tracking technology to explore native Chinese learners’ eye movements in reading Chinese text (Jian et al. 2013; Zhou et al. 2012), only a few eye-tracking studies have focused on the learning outcomes of non-native Chinese learners. Additional studies are needed to determine if integrating text information with visuals facilitates Chinese character recognition for non-native Chinese learners. As a result, this study implemented eye-tracking technology to determine the processes involved in Chinese character recognition by CFL learners, using multimedia materials as the learning source.

The Construction of Chinese Characters

Chinese is one of the oldest languages with a mature writing system, used by a quarter of the world’s population. Philologists have divided Chinese characters into two categories: simple figures (“wen” ) and compound characters (“zi” ). In ancient China, the formation principles and definitions of Chinese characters were initially found in The Rites of Zhou (2018). The structure of character components and the interpretation of Chinese characters were compiled by Xu Shen of the Eastern Han Dynasty, known as the “saint of philology”. Xu Shen’s work formed the basis for the six categories of Chinese characters (Zhen 2015). Chinese characters are classified based on the manner in which they were formed. There are six categories of Chinese characters namely, pictographs ( ), ideographic characters ( ), compound ideographs ( ), phono-semantic compounds ( ), phonetic loan characters ( ), and derivative cognates ( ) (“Chinese character classification” 2018). Chinese characters are considered to have originated from pictographs and ideographic characters. Compound ideographs and phono-semantic compound characters are composed of basic characters. Derivative cognate and phonetic loan characters are more abstract, complex forms of character compositions.

In ancient China, people created characters by drawing shapes and images of real-life and abstract objects. The category of pictograph comes from the concept of creation. Pictographs are the most basic and earliest category in the development of Chinese characters. In addition, pictographs refer to imitative drafts and rough sketches of real objects. Compound ideographs denote logical aggregates. They are characters in which two or more characters are combined to represent a new meaning. For example, male ( ) is composed of farm ( ) and power ( ). In ancient China, males were the main sources of labor working the fields.

To form phono-semantic compound characters, two or more simple characters are used as the two elements: one indicates the meaning and the other indicates the pronunciation. Over 90% of Chinese characters are phono-semantic compound characters. In the word (mother) for example, the character on the left side ( ) means “female” and also represents the radical, and the character on the right side ( ) indicates its pronunciation. In the word (father), the character on the top ( ) is its meaning, and the character at the bottom ( ) indicates the pronunciation (Wieger 1965).

The structure of characters makes Chinese one of the most difficult languages in the world. Chinese character recognition involves pattern recognition, which can be challenging for CFL learners due to the great number of characters and their complex structures (Chuang and Ku 2011). Due to the difficulty in finding the connection between phonetics and semantics (Ye 2011), many studies have found Chinese character recognition challenging when learning Chinese as a foreign language (Du et al. 2015; Shi et al. 2003). As a result, beginning learners often start with memorizing and understanding the meaning of vocabulary as fundamental learning. In today’s learning environment, strategies such as the integration of instructional multimedia have been utilized to provide multiple sensory modalities of information resources for CFL learners (Lee et al. 2008; Kuo and Hooper 2004). This said, it is important to understand if including multiple forms of information in instructional materials reduces the cognitive load of CFL learners for improved Chinese character recognition.

Characters with explicit and intuitive representations were selected for the study because the participants were beginner Chinese learners. This included pictographs, associated compounds, and phono-semantic compound characters.

Multimedia and Learning

Using different forms of information to achieve student learning has become a common approach in today’s classrooms. Researchers have focused on the impact of multimedia on learning in recent years. For example, Ainsworth (2006) argued that the three essential factors influencing the effectiveness of multimedia learning are the design of learning activities and instructional messages, multimedia support for learning, and the cognitive tasks required for interacting with multimedia. In today’s multimedia learning environments, various forms of visual information, such as line drawings, graphics, pictures, and animations, are used as supplements to verbal information, to enhance learning. The efficiency of information processing can determine the effectiveness of multimedia learning. As a result, multimedia design can have an impact on learners’ information processing at the superficial level. Mayer (2001) identified multimedia learning as learning from words (including printed words and spoken words) and pictures (including charts and illustrations). Learners can learn more effectively when instructions are designed using both text and pictures (Mayer 2001).

Teaching and learning have changed dramatically due to the development of information technology, through which instructional multimedia can be presented in more flexible ways. One of the most important aims of multimedia learning is to help learners establish a state of mind towards the representation of instructions and, ultimately, to construct new knowledge. However, multimedia information processing can be cognitively demanding. In 1994, Sweller proposed the Cognitive Load Theory, arguing that both learning task difficulty and instructional design impacts learners’ cognitive processing of the subject matter. Learning task difficulty produces a so-called “intrinsic cognitive load”, while instructional design creates a cognitive demand in the form of an “extraneous cognitive load”. The remainder of the working memory capacity is used for developing meaning, the so-called “germane cognitive load”. To achieve better learning outcomes, effective instructional designs should reduce the extraneous cognitive load to make room in the working memory for meaning making, i.e., germane load.

Chinese character recognition can be challenging; thus, the intrinsic cognitive load level is high for beginning CFL learners. Adding visual information, such as pictures and drawings, by way of a supplement for text explanation in Chinese learning may facilitate CFL learners’ understanding of the learning content. Gass proposed an Input-Interaction-Output (IIO) theory (Block 2003) arguing that to acquire a second language, learners need to go through the following processes; namely input, comprehensible input, noticing, intake, integration and output. Learners must first select the presented information. However, learners with different prior knowledge levels may comprehend the same resource input differently (Krashen 1985). Therefore, multimedia information is used to help learners build a relationship between the meaning and the word. Multimedia information for vocabulary learning can be presented to learners in various formats such as written text, voice narrations, pictures, animations, and/or videos. Learners have to select and attend to the input resource they are able to comprehend for making connections between the vocabulary and the meaning of the word (Beebe 1985). During the processes of selecting and attending, learners may notice any particular error and further intake the best approach such as choosing how the vocabulary is presented in order to achieve the best learning result. Lastly, learners are expected to produce the output by assimilating the similarities and differences of the learned vocabulary in order to show what they have learned in a learning task. However, additional visual information may not have an equal effect on the assimilation of the different categories of Chinese characters, as the rules applied vary. As a result, additional studies are needed to understand, from an information processing perspective, if using multimedia materials can help CFL students learn different categories of Chinese characters equally well.

Eye Movement and Cognitive Processes

According to the eye–mind assumption, a viewer’s eye movements can serve as a detailed blueprint of how information is retrieved and processed (Underwood and Radach 1998). Many studies have shown that viewers’ eye movements can be used to realize how viewers construct mental imagery and visual search patterns (Glaholt et al. 2010; Rayner 1990, 1998). Information revealed by eye fixation patterns indicates the processing involved in the viewing of different presentation formats such as texts and pictures (Hegarty 1992; Hegarty and Just 1993). Studies have placed attention on the perception of information encoded in complex visual presentations (de Koning et al. 2010; Kikas 2006; Ozcelik et al. 2009; van Gog and Scheiter 2010). As a result, eye-tracking measures have become important indicators to understand, in-depth, how multimedia learners cognitively process the information encoded in different modalities.

Eye-tracking indicators, such as the position and the number of fixations, the time to first fixation, the fixation duration, and the saccade length are commonly-used measures. Eye fixations on particular visual areas represent a viewer’s attention to the information that is to be coded and processed by the cognitive system. The time to first fixation refers to the duration when viewers’ eyes first focus on a specific area of interest from stimulus onset, and can indicate the viewers’ attention to that area (Rayner 1990). For the areas that viewers find interesting, time to first fixation would be shorter. The mean fixation duration is usually collected to investigate viewers’ attention on particular areas of the visuals. Saccades refer to the fast and scattered movements of one’s eyes in moving from one fixation point to another. It is believed that no visual information is processed while the eyes are making saccadic movements. The distance between two successive fixation points is defined as the saccade length. Saccade and saccade length form patterns with respect to how viewers process visual information.

Chinese characters are in pictorial form and are classified into different categories, depending on how they are formed. Without sufficient related prior knowledge and are not able to relate it with the alphabetic system, beginning CFL learners may interact differently with the learning materials when learning different categories of Chinese characters. Pictures and texts have been used to help CFL learners in Chinese character recognition. Beginning CFL learners may find it more useful to apply pictures to understanding a pictograph than the other categories of characters, as pictograph characters originate from drawings. In addition, using enlarged characters as the pictures, in addition to a text explanation, may be helpful for beginner CFL learners to understand the meaning of compound ideographs and phono-semantic compound characters, as these Chinese characters are closer to a graphic form. However, few studies have examined the relationship between the understanding of Chinese characters and how learners make use of multimedia information from learning materials. Effective instructional design requires further research to understand how CFL learners interact with the learning materials and how the interactivity affects their understanding of the meaning of Chinese characters. Therefore, this study sought to investigate differences in CFL students’ learning achievement, cognitive load level, and eye-movement patterns when learning different categories of Chinese characters.


This study used eye-movement data, achievement measures, and cognitive load surveys to investigate whether CFL learners showed different levels of cognitive load, learning achievement, and patterns of information processing when learning different categories of Chinese characters with multimedia materials. Details of the research methodology utilized in the study are given in the following sections.

Participants and Design

Thirty exchange postgraduate CFL students participated in this study. Participants were expected to be either novices or beginner Chinese learners who had less than one month of Chinese learning experience. A repeated measures design was applied to test the treatment effects. The type of Chinese characters served as the independent variable. CFL learners’ self-reported cognitive load level, learning achievement, and eye-movement patterns were the dependent variables for this study.

Instructional Materials

Chinese characters were selected from three categories: pictographs, phono-semantic compounds, and compound ideographs, to serve as content knowledge. Thirty slides were developed as learning materials for the study. Each category was composed of ten slides, and each slide presented the meaning and pronunciation (pinyin) of one Chinese character using both text and an illustration. Illustrations of the enlarged character, meaning of the character in English, and pronunciation (pinyin) were the basic elements presented on each slide. The pictograph category slides were integrated with visual illustrations to represent the meanings of the characters. Text explanations were used to depict the meanings of phono-semantic compound and compound ideograph characters (Figs. 1, Fig. 2, 3). The participants were allowed to view each page for 20 s. The three categories were shown in random order to minimize the possibility of a “carry-over” effect. Immediately after finishing a category, a cognitive load rating page was presented to the learners to collect their cognitive load level.

Fig. 1

An example slide for a pictograph character

Fig. 2

An example slide for a phono-semantic compound character

Fig. 3

An example slide for a compound ideograph character

Instruments and Apparatus

A paper-based pretest was given to participants to measure their prior understanding of Chinese characters. This paper-based test took participants approximately 10 min to complete. The purpose of the pretest was to ensure that participants had similar language backgrounds. The participants’ cognitive load level was identified using a mental effort survey (Paas 1992; Paas and Van Merrienboer 1994). Participants were required to rate, using a nine-point scale ranging from 1 (very, very low effort) to 9 (very, very high effort), how much mental effort they had invested in studying the materials. An achievement test measuring participants’ retention of the Chinese characters was developed for this study. It contained 18 items of matching questions. Six items tested participants to see if they could match the pictograph characters with their representative illustrations. Another six items asked participants to match the compound ideograph characters with the text describing the characters. The remaining six items required participants to match the explanations with their associate phono-semantic compound characters (see Appendix).

The participants’ eye movements were recorded by a Tobii T120 eye-tracker (Tobii Pro China, Shanghai, China). The system uses cameras beneath a 17-inch liquid crystal display to track user eye movements. Data were collected with Tobii Studio software (Tobii Pro). Areas of interest (AOIs) were defined for the corresponding components of the illustrated slides. The eye-tracking system provided a non-intrusive means for collecting eye-tracking indicators, including such details as the number of fixations in different AOIs, the total duration of fixations in different AOIs, and the time taken by viewers to relocate their fixations toward particular AOIs.

Data Analysis

The mean score from students’ performance tests and the self-reported cognitive load results were used as the data source for research purposes. Meanwhile, the eye-movement information collected by the eye-tracking system (fixation count, fixation durations, and the time to first fixation on either the text or a picture area) was utilized as indicators of the users’ attention and information processing patterns. A repeated measure analysis of variance (ANOVA) was implemented to analyze differences in the aforementioned measures caused by recognizing different types of Chinese characters. The information processing patterns were analyzed qualitatively and compared across treatments in an attempt to identify any differences in cognitive processes due to the nature of differences in Chinese character type.

Results and Discussion

Cognitive Load Level and Achievement

The results from the repeated measures ANOVA showed a significant difference in cognitive load level among the three categories of characters (F = 7.159, p < .05; see Table 1). The mean cognitive load rating for pictographs was 3.92 (standard deviation, SD = 1.41), 5.33 (SD = 1.24) for compound ideographs, and 4.46 (SD = 1.84) for phono-semantic compound characters. Participants invested the greatest mental effort in learning compound ideographs. Among the three categories, understanding pictograph characters required the least amount of mental effort. The illustrative nature of pictograph characters seemed to provide explicit meaning to the learners. It is likely that a greater amount of mental effort is required to relate the associative compound character with its abstract text description.

Table 1 Mean scores and standard deviations of cognitive load level in different categories

No significant differences were found among the three categories of characters in participants’ pretest performances (F = 0.15, p = .86). The results from the pretest showed that participants possessed the same knowledge level for the three categories of Chinese characters. Echoing the findings from the cognitive load survey, participants performed the best on post-test pictograph items among the three categories of characters (F = 18.57, p < .001; see Table 2). However, no significant differences were found in post-test performance between phono-semantic compounds and compound ideographs. Again, the illustrative nature of the characters accompanied by the picture illustrations seemed to promote an understanding of pictograph characters.

Table 2 Mean scores and standard deviations of post-test in different categories

Total Fixation Duration

Eye-movement measures showed a significant difference in total fixation duration among the three categories of characters (F = 114.75, p <.001; see Table 3). Participants were found to fixate more on the associative compound and phono-semantic compound characters than on pictograph slides. However, no significant difference between phono-semantic compound and compound ideograph slides were found in total fixation duration. Research has shown that, compared with processing graphical information, reading text paragraphs requires more time for linear processing of text information (Rayner et al. 2015). As a result, participants were likely to spend more time processing slides of phono-semantic compounds and compound ideograph characters due to the text information used to explain the characters.

Table 3 Means and standard deviations for total fixation duration (in millisecond) in three categories of slides

To understand further how participants process different kinds of learning material, eye-movement data on different components of the slides were compared.

Description Area

The fixation count on the description area of the pictographs was significantly less than that on the phono-semantic compound and compound ideograph slides (see Table 4). No differences between the phono-semantic compound and compound ideograph slides regarding fixation count on the description area were found. The results also indicated that, when viewing the description area, viewers showed greater mean fixation duration on the description area of the pictographs than that of the other categories of slides. According to the findings, viewers demonstrated fewer fixation points but longer mean fixation duration when viewing picture descriptions as opposed to text descriptions (Fig. 4). Reading text information is a linear process that requires viewers to focus line-by-line on the text information (Rayner et al. 2015). In contrast, viewers usually employ a more global information processing approach when inspecting pictures (Kriz and Hegarty 2007). As a result, it is not surprising that viewers showed less fixation count on the picture descriptions of the pictograph slides. Considering the longer mean fixation duration and lower encountered cognitive load level on pictograph slides, learning pictograph characters with picture descriptions is likely to promote more efficient information processing, for better learning outcomes.

Table 4 Means and standard deviations for fixation counts and mean fixation duration (in millisecond) in description area of three categories of slides
Fig. 4

Fixation distributions on phono-semantic compound (left) and pictograph (right) slides

Character Area

On the character area of the slides, no significant differences among the three categories were found regarding the mean fixation duration (see Table 5). Significant differences were identified in a comparison of the fixation count of the character areas of the three categories of slides. Participants showed significantly more fixations on pictographs than on the other two categories of slides. No differences between phono-semantic compound and compound ideograph slides regarding fixation count on the character area were evident. The results indicated that the graphical nature of the pictograph characters drew more of the viewers’ attention toward the character areas and, therefore, the viewer tended to fixate more on the illustrated pictograph characters. Meanwhile, pictures were used to make sense of the characters for the pictographs, due to their graphical origin. Given a limited time duration, viewing the picture descriptions probably spared the viewer time for inspecting the character area.

Table 5 Means and standard deviations for fixation counts and mean fixation duration (in millisecond) in Character area of three categories of slides

Pronunciation Area

Viewers showed greater fixation count on the pronunciation area of the pictograph than on the other categories of slides (see Table 6). No significant differences among the three categories of slides regarding the mean fixation duration on the pronunciation area were observed. Again, it is likely that viewers were able to save time by first looking at the picture description and then examining the pronunciation area. However, we were unable to determine if inspecting the pronunciation area promoted correct pronunciation of the characters, as this was not something that we tested for in this study.

Table 6 Means and standard deviations for fixation counts and mean fixation duration (in millisecond) in pronunciation area of three categories of slides

Eye fixations indicated the information to be encoded and processed by the cognitive system. Studies have reported eye-movement patterns when dealing with information encoded by a combination of formats (onscreen text and illustrations), based on viewers’ eye fixations (Hegarty 1992; Hegarty and Just 1993). The fixation duration and the pattern of the eye gaze can be used to track the level of information processing (Renshaw et al. 2004). Compared to the text descriptions used in the other categories of slides, the unambiguous nature of the illustrative pictograph characters seemed to demand less cognitive effort when it came to making connections between the character and the illustrative description, which resulted in better post-test performance.

The fixation point distribution on each of the slides seemed to indicate that participants fixated more on the characters when inspecting the pictograph than on the other two categories of characters (Fig. 5). Specifically, participants spent more time reading the text description than focusing on the character itself. Given the limited time available for viewing the slides, less fixation was placed on the character area when the text passage served as the major resource for understanding phono-semantic and associative compound characters. This finding echoes those of Hegarty and Justs (1993) that viewers examine pictures after finishing onscreen text reading. When viewing pictograph slides, participants focused more on the picture-like character, due to the use of picture descriptions (Fig. 5). As a result, longer fixations were placed on the character areas of the pictograph slides than on the slides of the other two categories of characters.

Fig. 5

A participant’s scan path and fixation distribution on compound ideograph (left) and pictograph (right) characters presentation

Given the limited time allowed for viewing the slides, it may be challenging for beginner CFL learners to understand and correctly recognize Chinese characters. The cognitive load survey and eye-tracking data indicated that it was easier for CFL learners to learn pictograph characters; this was further confirmed in post-test performance results, which also showed an advantage regarding the learning of pictograph characters. For CFL learners, the illustrative nature of pictograph characters seemed to make them less ambiguous than phono-semantic compound and compound ideograph characters. Understanding abstract phono-semantic compound and compound ideograph characters with text descriptions may be cognitively demanding, as viewers are required to use solely visual channels to process both the character and text description areas. According to Mayer’s (2001) dual channel assumption, replacing onscreen text with voice-over descriptions may reduce viewers’ cognitive workload because viewers would be able to process the information with more working memory capacity using both visual and verbal channels. Meanwhile, the text descriptions may not be able to describe fully the character meaning, especially for first-time learners. For phono-semantic compounds, in particular, instead of the text description, utilizing videos or animation to provide scenarios for the ambiguous meaning of characters may help CFL learners to understand the meaning of the characters. However, the rich information encoded in video/animation may also require a relatively great cognitive demand with regard to information processing. In such cases, it would be ideal to give viewers ample time to inspect the instructional materials. However, future studies are required to determine if using voice-over and video/animation improve the understanding of Chinese characters, particularly for the more abstruse phono-semantic compound and compound ideograph characters.

Based on the findings of the study, the first suggestion for designing a Chinese learning multimedia environment is that designers should make use of different modalities to facilitate Chinese character learning. In this study, CFL learners learned better when the characters were accompanied with explanative pictures. Using pictures to explain the meaning of the characters resulted in more effective information processing patterns and lower cognitive load levels. However, different categories of Chinese characters seemed to require different formats of explanative presentations to help beginning CFL learning understand their meaning. Therefore, the second suggestion for developing effective Chinese learning multimedia is to provide alternative formats of representations based on the nature of the character. For learners to understand more abstract Characters such as phono-semantic compounds and compound ideographs, alternative formats of explanative information such as videos or animations may be needed. In additional to static pictures, videos and animations can be utilized to provide visual prompts for the origin of the creation of the characters. Meanwhile, videos and animations can also supply episodic cues that make abstract meaning of the characters more understandable. To design effective multimedia environments for Chinese character learning, designers should consider every aspect of underlying human cognition mechanisms as well as the nature of Chinese characters. Accordingly, CFL learners will be able understand and memorize the meaning of Chinese characters by making connections between their prior knowledge and different formats of multimedia information.

For beginning CFL learners whose native language is based on an alphabetic writing system, Chinese characters may simply be meaningless novel visuals. In traditional classroom teaching, teaching aids such as flashcards are needed to help highlight the shape, pronunciation, and meaning of different characters. The results of the study revealed that the flashcard style teaching aids worked best for teaching pictographs because of the visual nature of the characters. The results also showed that it demands more cognitive sources to understand the implicit meanings of phono-semantic compound and compound ideograph characters. As a result, additional descriptions and explanations from the instructors are still required to make abstractness of these categories explicit for beginning CFL learners in traditional classroom teaching of Chinese characters. Meanwhile, information technology has been widely accepted as either a supportive means for traditional classrooms or the major source for self-paced online learning. Therefore, the flashcard style learning aids can be presented in the form of onscreen slides that help novice CFL learners understand pictograph characters. In addition to the verbal information provided by the instructor, dynamic visuals such as animations and videos may be used to provide detailed contexts and episodes for understanding phono-semantic compound and compound ideograph characters. In sum, the findings of the study have provided suggestions that can be applied to both online and traditional classroom teaching environments for beginning CFL learners in learning of Chinese characters.


For CFL learners, both the nature of the Chinese characters and the format of information play important roles in determining the effectiveness of Chinese character learning. The best format of information for helping CFL learners to learn pictograph characters is that of pictures, as they tend to match more closely the illustrative nature of pictograph characters. Meanwhile, text passages did not seem to promote successful understanding of phono-semantic compound or compound ideograph characters, as these characters seemed to be ambiguous to beginner learners. Different forms of supportive information, such as videos and animation, may provide explicit scenarios that make more sense to CFL learners when it comes to making connections between the characters and the dynamic visual descriptions. The results of the study can be applied to either online or traditional classroom contexts that aim for novice level of Chinese character learning. Meanwhile, future studies are indicated with the aim of exploring the effectiveness of videos and animations on achieving Chinese character understanding among CFL learners.


  1. Ainsworth, S. (2006). DeFT: A conceptual framework for considering learning with multiple representations. Learning and Instruction, 16(3), 183–198.

    Article  Google Scholar 

  2. Beebe, L. (1985). Input: Choosing the right stuff. In S. Gass & C. Madden (Eds.), Input in second language acquisition (pp. 404–414). Rowley, MA: Newbury House.

    Google Scholar 

  3. Block, D. (2003). The social turn in second language acquisition. Washington, D.C.: George-town University Press.

    Google Scholar 

  4. Chen, H.-Y., & Liu, K.-Y. (2008). Web-based synchronized multimedia lecture system design for teaching/learning Chinese as second language. Computers & Education, 50(3), 693–702.

    Article  Google Scholar 

  5. Chinese character classification. (2018). In Wikipedia. Retrieved November 13, 2018, from

  6. Chuang, H.-Y., & Ku, H.-Y. (2011). The effect of computer-based multimedia instruction with Chinese character recognition. Educational Media International, 48(1), 27–41.

    Article  Google Scholar 

  7. Cook, V. (2003). Introduction: The changing LI in the L2 user's mind. In V. Cook (Ed.), The effects of learning a second language on the first: The case of increased metalinguistic awareness (pp. 1–18). Clevedon: Multilingual Matters.

    Google Scholar 

  8. de Koning, B. B., Tabbers, H. K., Rikers, R. M. J. P., & Paas, F. (2010). Attention guidance in learning from a complex animation: Seeing is understanding? Learning and Instruction, 20(2), 111–122.

    Article  Google Scholar 

  9. Du, J., Zhai, J.F., Hu, J.S., Zhu, B., Wei, S., & Dai, L.R. (2015). Writer adaptive feature extraction based on convolutional neural networks for online handwritten Chinese character recognition. Paper presented at 13th international conference in Tubis.

  10. Glaholt, M. G., Wu, M.-C., & Reingold, E. M. (2010). Evidence for top-down control of eye movements during visual decision making. Journal of Vision, 10, 1–10.

    Article  Google Scholar 

  11. Hegarty, M. (1992). The mechanics of comprehension and comprehension of mechanics. In K. Rayner (Ed.), Eye movements and visual cognition. Springer series in neuropsychology (pp. 428–443). New York: Springer.

    Google Scholar 

  12. Hegarty, M., & Just, M. A. (1993). Constructing mental models of machines from text and diagrams. Journal of Memory and Language, 32, 717–742.

    Article  Google Scholar 

  13. Hyönä, J. (2010). The use of eye movements in the study of multimedia learning. Learning and Instruction, 20(2), 172–176.

    Article  Google Scholar 

  14. Jian, Y-C., Chen, M. L., & Ko, H. W. (2013). Context effects in processing of Chinese academic words: An eye-tracking investigation. Reading Research Quarterly48(4), 403–413.

    Article  Google Scholar 

  15. Jiang, X. (2008). Research on words of teaching Chinese as a second language and learning to read. Beijing: Beijing Language and Culture University Press.

    Google Scholar 

  16. Kikas, E. (2006). The effect of verbal and visuo-spatial abilities on the development of knowledge of the Earth. Research in Science Education, 36(3), 269.

    Article  Google Scholar 

  17. Krashen, S. D. (1985). The input hypothesis: Issues and implications. London: Longman.

    Google Scholar 

  18. Kriz, S., & Hegarty, M. (2007). Top-down and bottom-up influences on learning from animations. International Journal of Human-Computer Studies, 65(11), 911–930.

    Article  Google Scholar 

  19. Kuo, M-L., & Hooper, S. (2004). The effects of visual and verbal coding mnemonics on learning Chinese characters in computer-based instruction. Educational Technology Research and Development, 52, 23–34.

    Article  Google Scholar 

  20. Lee, C.-P., Shen, C.-W., & Lee, D. (2008). The effect of multimedia instruction for Chinese learning. Learning, Media and Technology, 33(2), 127–138.

    Article  Google Scholar 

  21. Mason, A., & Zhang, W. (2017). An exploration of the use of mobile applications to support the learning of Chinese characters employed by students of Chinese as a foreign language.

  22. Mayer, R. E. (2001). Multimedia learning. New York: Cambridge University Press.

    Google Scholar 

  23. Ozcelik, E., Karakus, T., Kursun, E., & Cagiltay, K. (2009). An eye-tracking study of how color coding affects multimedia learning. Computers & Education, 53(2), 445–453.

    Article  Google Scholar 

  24. Paas, F. G. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. Journal of Educational Psychology, 84(4), 429–434

    Article  Google Scholar 

  25. Paas, F. G. W. C., & Van Merriënboer, J. J. G. (1994). Variability of worked examples and transfer of geometrical problem-solving skills: A cognitive-load approach. Journal of Educational Psychology, 86(1), 122–133.

    Article  Google Scholar 

  26. Rayner, K. (1990). Eye movements and visual cognition: Introduction (pp. 1–7). New York: Springer-Verlag New York Inc.

    Google Scholar 

  27. Rayner, K. (1998). Eye movements in reading and information processing. 20 years of research. Psychological Bulletin, 124(3), 372–422.

    Article  Google Scholar 

  28. Rayner, K., Abbott, M., Schotter, E., Belanger, N., Higgins, E., Leinenger, M., et al. (2015). Keith Rayner eye movements in reading data collection. UC San Diego Library Digital Collections.

    Article  Google Scholar 

  29. Renshaw, J. A., Finlay, J. E., Tyfa, D., & Ward, R. D. (2004). Understanding visual influence in graph design through temporal and spatial eye movement characteristics. Interacting with Computers, 16, 557–578.

    Article  Google Scholar 

  30. Rites of Zhou. (2018). In Wikipedia. Retrieved December 10, 2018, from

  31. Shen, H. (2004). Level of cognitive processing: Effects on character learning among non-native learners of Chinese as a foreign language. Language and Education, 18(2), 167–182.

    Article  Google Scholar 

  32. Shi, D., Damper, R. I., & Gunn, S. R. (2003). An approach to off-line handwritten Chinese character recognition based on hierarchical radical decomposition. Journal of Quantitative Linguistics, 10(1), 41–69.

    Article  Google Scholar 

  33. Schnotz, W., & Kürschner, C. (2007). A reconsideration of cognitive load theory. Educational Psychology Review, 19, 469–508.

    Article  Google Scholar 

  34. Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295–312.

    Article  Google Scholar 

  35. Underwood, G., & Radach, R. (1998). Eye guidance and visual information processing: Reading, visual search, picture perception and driving. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 1–27). Oxford: Elsevier.

    Google Scholar 

  36. van Gog, T., & Scheiter, K. (2010). Eye tracking as a tool to study and enhance multimedia learning. Learning and Instruction, 20(2), 95–99.

    Article  Google Scholar 

  37. Wen, Y. (2018). Chinese character composition game with the augment paper. Educational Technology & Society, 21(3), 132–145.

    Google Scholar 

  38. Wieger, L. S. J. (1965). Chinese characters: Their origin, etymology, history, classification and signification. A thorough study from Chinese documents 2nd ed. English and revision according to the 4th French ed. New York: Paragon Book Reprint Corp.

  39. Ye, L.-J. (2011). Teaching and learning Chinese as a foreign language in the United States: To delay or not to delay the character introduction (Doctoral dissertations, College of Art and Sciences of Georgia State University. Retrieved November 17, 2018, from

  40. Zhen, Z. (2015). Advantages and thinking on design of Chinese characters’ graphics. Studies in Literature and Language, 11(2), 82–87.

    Article  Google Scholar 

  41. Zhou, P., Su, Y. E, Crain, S., Gao, L., & Zhan, L. (2012). Children’s use of phonological information in ambiguity resolution: A review from Mandarin Chinese. Journal of Child Language, 39, 687–730.

    Article  Google Scholar 

Download references


This research project was supported in part by the Ministry of Science and Technology, Taiwan; Grant No. MOST 106-2511-S-415-009-.

Author information



Corresponding author

Correspondence to Han-Chin Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.




Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, HC. Using Eye-Tracking Technology to Explore the Impact of Instructional Multimedia on CFL Learners’ Chinese Character Recognition. Asia-Pacific Edu Res 30, 33–46 (2021).

Download citation


  • Chinese as a foreign language (CFL)
  • Chinese character recognition
  • Eye-tracking technology
  • Multimedia learning