Abstract
Multi Lingual Snippet Generation (MLSG) systems provide the users with snippets in multiple languages. But collecting and managing documents in multiple languages in an efficient way is a difficult task and thereby makes this process more complicated. Fortunately, this requirement can be fulfilled in another way by translating the snippets from one language to another with the help of Machine Translation (MT) systems. The resulting system is called Cross Lingual Snippet Generation (CLSG) system. This paper presents the development of a CLSG system by Snippet Translation when documents are available only in one language. We consider the English-Bengali language pair for snippet translation in one direction (English to Bengali). In this work, a major concentration is given towards translating snippets with simpler but excluding deeper MT concepts. In experimental results, an average BLEU score of 14.26 and NIST score of 4.93 are obtained.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54 (2003)
Carbonell, J., Goldstein, J.: The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries. In: ACM SIGIR, pp. 335–336 (1998)
Knight, K., Marcu, D.: Statistics-based summarization - step one: Sentence compression. In: The American Association for Artificial Intelligence Conference (AAAI), pp. 703–710 (2000)
Barzilay, R., Elhadad, N., McKeown, K.R.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artificial Intelligence Research. 17, 35–55 (2002)
Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid - based summarization of multiple documents. J. Information Processing and Management. 40, 919–938 (2004)
Lin, C.Y., Hovy, E.H.: From Single to Multidocument Summarization: A Prototype System and its Evaluation. In: ACL, pp. 457–464 (2002)
Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G.B., Zhang, X.: Cross-document summarization by concept classification. In: SIGIR, pp. 65–69 (2002)
Bhaskar, P., Bandyopadhyay, S.: A Query Focused Multi Document Automatic Summarization. In: The 24th Pacific Asia Conference on Language, Information and Computation (PACLIC 24). Tohoku University, Sendai (2010)
Bhaskar, P., Bandyopadhyay, S.: A Query Focused Automatic Multi Document Summarizer. In: The International Conference on Natural Language Processing (ICON), IIT, Kharagpur, India (2010)
Bhaskar, P.: Query Focused Language Independent Multi-document Summarization and Information Retrieval for English and Bengali. Jian, A. (ed.). LAMBERT Academic Publishing, Saarbrücken (2013) ISBN 978-3-8484-0089-8
Tombros, A., Sanderson, M.: Advantages of Query Biased Summaries in Information Retrieval. In: SIGIR (1998)
Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast Generation of Result Snippets in Web Search. In: SIGIR (2007)
Huang, Y., Liu, Z., Chen, Y.: Query Biased Snippet Generation in XML Search. In: SIGMOD, Vancouver, BC, Canada (2008)
Reddy, M.V., Hanumanthappa, M., Kumar, M.: Cross Lingual Information Retrieval Using Search Engine and Data Mining. ACEEE International Journal on Information Technology (2011)
Jagarlamudi, J., Kumaran, A.: Cross-lingual Information Retrieval for Indian Languages. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 80–87. Springer, Heidelberg (2008)
Xu, J., Weischedel, R.: Cross-lingual information retrieval using hidden Markov models. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13 (2000)
Och, F.J., Ney, H.: The Alignment Template Approach to Statistical Machine Translation. In: ACL (2004)
Chiang, D.: A Hierarchical Phrase-Based Model for Statistical Machine Translation. In: 43rd Annual Meeting on Association for Computational Linguistics (2005)
Pal, S., Naskar, S.K., Bandyopadhyay, S.: MWE Alignment in Phrase Based Statistical Machine Translation. In: The XIV Machine Translation Summit, pp. 61–68 (2013)
Islam, M.Z., Tiedemann, J., Eisele, A.: English to Bangla Phrase – Based Machine Translation. In: The 14th Annual Conference of The European Association for Machine Translation, Saint-Raphaël, France, pp. 27–28 (2010)
Bhaskar, P., Bandyopadhyay, S.: Cross Lingual Query Dependent Snippet Generation. International Journal of Computer Science and Information Technologies (IJCSIT) 3(4), 4603–4609 (2012) ISSN: 0975-9646
Bhaskar, P., Bandyopadhyay, S.: Language Independent Query Focused Snippet Generation. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 138–140. Springer, Heidelberg (2012)
List of Unicode characters on Wikipedia, http://en.wikipedia.org/wiki/List_of_Unicode_characters
Koehn, P.: Statistical machine translation. Cambridge University Press (2010)
Rama, T., Gali, K.: Modeling machine transliteration as a phrase based statistical machine translation problem. In: Named Entities Workshop: Shared Task on Transliteration, pp. 124–127 (2009)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Human Language Technology Conference (HLT), San Diego, CA, pp. 128–132 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lohar, P., Bhaskar, P., Pal, S., Bandyopadhyay, S. (2014). Cross Lingual Snippet Generation Using Snippet Translation System. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)