Abstract
Recently the number of people who learn Chinese as a Foreign Language (CFL) increased. New comers, international students, and denizened spouses all need to improve their Chinese reading fluency and listening comprehension for daily communication and work requirements. However, not everyone gets opportunity for formal education in a language school. Thus, informal learning is very important for CFL learners in Taiwan. For novice Chinese learners, they should first master a skill to grouping Chinese words into meaningful chunks, i.e. Chinese segmentation. For instance, “老師對教育的貢獻” (teachers’ contribution in education). After Chinese word segmentation, the sentence becomes “老師(teachers)/對(P)/教育(education)/的(DE)/貢獻(contribution)” from “老/師/對/教/育/的/貢/獻”. Consequently, this study used two Chinese segmentation methods to highlight meaningful and important word chunks in subtitles of Chinese videos and evaluate its usability for CFL learners. The first method adopted the top 800 and 1600 high-frequency words from an analysis report based on Academia Sinica Balanced Corpus of Modern Chinese to identify proper word segmentation in video subtitles and analyze its performance based on the forward maximum matching method. The statistical results show that most Chinese subtitles still remain unsegmented (62.3%) which means the Chinese subtitles in the videos are not appropriately segmented based on the corpus that contains the top 800 high frequency words. However, with the integration of the top 1600 high frequency words in the corpus, approximately 60% of the subtitles in each video are effectively segmented, and numerous unknown words still remain. Active phrases, idioms, and short phrases in Chinese subtitles may lead to the difficulty in word segmentation; moreover, the usability testing result of using high frequency words to conduct word segmentation is not significant.
The second method used natural language processing technique to split Chinese subtitles into its separate morphemes. The study adopted CKIP Chinese parser, which is a word segmentation tool for Chinese, to split subtitles according their part-of-speech tagging (i.e. grammatical tagging). The statistical results show that 97.26% subtitles are split, but the usability testing shows that subjective satisfaction is not good enough. To further investigation, we asked subjects to identify the “improper” word segmentation. For instance, the subtitle “接受治療很久了” (treated for a long time) will be split into “接受/治療/很/久/了”, but most novices think that the proper segmentation should be “接受/治療/很久了”. The “improper” rate is about 22.30% on average. In other words, the segmentation results from Chinese parser based on natural language processing technique are not best scaffolding for Chinese novice while watching videos with Chinese subtitles. The preliminary results of usability testing show that the second method can provide effective scaffolding for novice, but the granularity of chunked words may be too fine to read fluently sometimes (i.e. less than thirty percentage in results). Consequently, adaptation mechanism is required for learners to achieve the balance point of provided scaffolding between aforementioned two methods. For example, the Chinese function words, such as 很 and 了, serve only grammatical functions (i.e. they have no meaning by themselves). Those function words should not be separated out from subtitles for learning purpose. Further work is necessary to find out the proper granularity for chunking words, design adaptation mechanism of segmentation, and prevent segmentation errors in new or unknown words.
Chapter PDF
References
Chun, D.M., Plass, J.L.: Research on text comprehension in multimedia environments. Language Learning & Technology 1(1), 1–35 (1997)
Plass, J.L., Chun, D.M., Mayer, R.E., Leutner, D.: Supporting visual and verbal learning preferences in a second language multimedia learning environment. Journal of Educational Psychology 90(1), 25–36 (1998)
Danan, M.: Reversed subtitling and dual coding theory: New directions for foreign language instruction. Language Learning 42(4), 497–527 (1992)
Borras, I., Lafayette, R.: Effects of multimedia courseware subtitling on the speaking performance of college students of French. The Modern Language Journal 78(1), 61–75 (1994)
Danan, M.: Captioning and subtitling: Undervalued language learning strategies. Meta 49(1), 67–77 (2004)
Garza, T.J.: Evaluating the use of captioned video materials in advanced foreign language learning. Foreign Language Annals 24(3), 239–258 (1991)
Markham, P.L., Peter, L.: The influence of English language and Spanish language captions on foreign language listening/reading comprehension. Journal of Educational Technology Systems 31(3), 331–341 (2003)
Doughty, C.J.: Effect of instruction on learning a second language: A critique of instructed SLA research. In: VanPatten, B., Williams, J., Rott, S. (eds.) Form-Meaning Connections in Second Language Acquisition, pp. 181–202. Lawrence Erlbaum Associates, Mahwah (2004)
Burger, G.: Are TV programs with video subtitles suitable for teaching listening comprehension? Zielsprache Deutsch 20(4), 10–13 (1989)
Froehlich, J.: German videos with German subtitles: A new approach to listening comprehension development. Die Unterrichtspraxis/Teaching German 21(2), 199–203 (1988)
Grimmer, C.: Supertext English language subtitles: A boon for English language learners. EA Journal 10(1), 66–75 (1992)
Vanderplank, R.: The value of teletext sub-titles in language learning. English Language Teaching Journal 42(4), 272–281 (1988)
Baltova, I.: Multisensory language teaching in a multidimensional curriculum: The use of authentic bimodal video in core French. The Canadian Modern Language Review 56(1), 32–48 (1999)
Markham, P.L.: Captioned television videotapes: Effects of visual support on second language comprehension. Journal of Educational Technology Systems 21(3), 183–191 (1993)
Markham, P.L.: Captioned videotapes and second-language listening word recognition. Foreign Language Annals 32(3), 321–328 (1999)
Neuman, S.B., Koskinen, P.: Captioned television as comprehensible input: Effects of incidental word learning from context for language minority students. Reading Research Quarterly 27, 94–106 (1992)
Chenoweth, N.A., Murday, K.: Measuring student learning in an online French course. CALICO Journal 20(2), 285–314 (2003)
Chenoweth, N.A., Ushida, E., Murday, K.: Student learning in hybrid French and Spanish courses: An overview of Language Online. CALICO Journal 24(1), 285–314 (2006)
Sanders, R.F.: Redesigning introductory Spanish: Increased enrollment, online management, cost reduction, and effects on student learning. Foreign Language Annals 38(4), 523–532 (2005)
Scida, E.E., Saury, R.E.: Hybrid courses and their impact on student and classroom performance: A case study at the University of Virginia. CALICO Journal 23(3), 517–531 (2006)
Chen, L.F.: From file appreciation to the curriculum design and experiment of teaching Chinese. National Taiwan Normal Unversity Mandarin Training Center, Taipei (2007)
Fang, S.L.: Segmentation and pronunciation annotation in Mandarin Chinese. Master thesis. National Tsing Huan University, Taiwan (2008)
Tang, J.H.: A Chinese speech synthesis system improved by a word segmentation method. Master thesis. National Tsing Huan University, Taiwan (2010)
Cheng, C.C.: Word-focused extensive reading with guidance. In: Thirteenth International Symposium on English Teaching, pp. 24–32. Crane Publishing Co., Taipei (2004)
Cheng, C.C.: From Digital Archives to Digital Learning: Determining Sentence Readability. In: Bi-Jiaoda Conference on Corpus Linguistics and English Testing, Shanghai, June 13 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chang, CK. (2013). Usability Evaluation of Two Chinese Segmentation Methods in Subtitles to Scaffold Chinese Novice. In: Marcus, A. (eds) Design, User Experience, and Usability. Health, Learning, Playing, Cultural, and Cross-Cultural User Experience. DUXU 2013. Lecture Notes in Computer Science, vol 8013. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39241-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-39241-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39240-5
Online ISBN: 978-3-642-39241-2
eBook Packages: Computer ScienceComputer Science (R0)