Abstract
Multilingual information access and retrieval is a specific area of the academic domain of information access and retrieval; the main focus is the development of systems for information discovery in multiple languages, both monolingually and across languages. There is both a social and an economic need for such systems and there is ample evidence that this need will grow substantially over the coming years. In this introduction, we describe the range and intentions of research and development in this area from its recognition as an independent discipline in the mid-1990s to the challenges that it is now facing today.
Douglas W. Oard and David Hull, AAAI Symposium on Cross-Language IR, Spring 1997, Stanford, USA
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In this period, an increasing proportion of new users coming online were individuals and small businesses chiefly interested in using the Internet for local communication. In non-English speaking countries, large firms or public institutions may have an incentive to also post their web pages in English, but a small local business does not. As more people in a language community come online, content and service providers have a strong interest in accommodating them in their own language.
- 2.
In 2009 at the Gartner Symposium, Orlando, Eric Schmidt, CEO of Google, predicted that within 5 years the Internet will be dominated by Chinese-language content.
- 3.
Other terms that have been used are Translingual and Cross-Lingual IR. ‘Translingual’ was made popular for a short period by the TIDES project in the US but now seems to have fallen into disuse; ‘cross-lingual’ can still be found but ‘cross-language’ is generally the preferred choice.
- 4.
See the Unicode web page http://www.unicode.org/ for Unicode standards and updates.
- 5.
Internationalisation and localisation are discussed in the section on implementing multilingual user interfaces in Chapter 4.
- 6.
Most of these initiatives have been funded by the Directorate for Digital Content and Cognitive Systems and the Language Technologies programmes.
- 7.
- 8.
Two of these projects which have had considerable impact and are cited several times in this book are the Clarity and the MultiMatch projects. The objective of Clarity was to develop general purpose CLIR techniques which would work with minimal translation resources; MultiMatch aimed at providing personalised access to cultural heritage information over both language and media boundaries.
- 9.
See DARPA policy statement at http://www.darpa.mil/darpatech99/Presentations/scripts/ito/ITOTIDESScript.txt
- 10.
- 11.
- 12.
Over 15 million at the beginning of 2011.
- 13.
The actual name was ‘Workshop on Cross-Linguistic Information Retrieval’, however discussing terminology for this new sector of IR the participants felt that ‘cross-language’ was a more appropriate term.
- 14.
The creation of test collections for (ML)IR is described in detail in Chapter 5.
- 15.
- 16.
- 17.
http://research.nii.ac.jp/ntcir/
- 18.
- 19.
TopTenReviews is a website which aggregates reviews for software, hardware, and web services, from other sites and publications, see http://translation-software-review.toptenreviews.com/
- 20.
- 21.
IDC is a global provider of market intelligence, advisory services, and events for the information technology, telecommunications, and consumer technology markets, see http://www.idc.com/
- 22.
CEO of Clairvoyance Corporation.
- 23.
IDC Predictions 2009.
- 24.
Think of an English tourist visiting south-east Asia and interested in traditional music and dance. An initial query in English finds preliminary information on dances in Cambodia, Vietnam and Laos. Some of the documents returned have pictures and music associated. The tourist uses these to find similar images and music and also reformulates the query in CLIR mode, specifying that they are interested in target documents in these three languages. The documents returned are no longer in English but are in the national languages accompanied by an MT gist in English.
- 25.
There are approximately 6,800 known languages in the world.
- 26.
If this problem is ever to be overcome, it implies a rethinking of the current mechanisms for CLIR and increased study of language-independent or conceptual mapping systems.
References
Ballestreros L, Croft WB (1998) Resolving ambiguity for cross-language retrieval. In: Proc. 21st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1998). ACM Press: 64–71
EMIR (1994) Final report of the EMIR project number 5312. Commission of the European Union, Brussels
Ferro N, Peters C (2008) From CLEF to TrebleCLEF: the evolution of the cross-language evaluation forum. In: Proc. NTCIR-7 Workshop Meeting, December 16–19 2008, NII, Tokyo. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings7/
FIRE (2008) First workshop of the forum for information retrieval evaluation. http://www.isical.ac.in/~fire/2008/working_notes.html
FIRE (2010) Working notes FIRE 2010, 19-21 February 2010, DAIICT, Gandhinagar. http://www.isical.ac.in/~fire/2010/working_notes.html
Gey FC, Kando N, Peters C (2005) Cross-language information retrieval: the way ahead. J. Inf. Process. & Manag. 41(3): 415–431
Gey FC, Kando N, Lin C-Y, Peters C (2006) New directions in multilingual information access. SIGIR 2006 workshop report. ACM SIGIR Forum 40(2): 31–39
Gey FC, Kando N, Karlgren J (2009) Information access in a multilingual world: Transitioning from research to real-world applications. ACM SIGIR Forum 43(2): 24–28
Grefenstette G. (ed.) (1998) Cross-language information retrieval. The Kluwer International Series on Information Retrieval, Kluwer Academic Publishers, Boston
Harman D (2003) The development and evolution of TREC and DUC. In Proc. 3rd NTCIR workshop on research in information retrieval, question answering, and summarization. NII, Tokyo. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/
Harman D, Braschler M, Hess M, Kluck M, Peters C, Schäuble P, Sheridan P (2001) CLIR evaluation in TREC. In Peters C (ed.) op.cit.: 7–23
Harman D, Kando N, Majumder P, Mitra M, Peters C (eds.) (2010) Special issue on Indian language information retrieval. ACM Trans. Asian Lang. Inform. Process. 9(3)
Hovy E, Ide N, Frederkin R (eds.) (1999) Multilingual information management: current levels and future abilities, NSF/EC/DARPA, http://www.cs.cmu.edu/~ref/ mlim/index.html
ISO (1985) ISO Standard 5964-1985: Guidelines for the establishment and development of multilingual thesauri. First edition 1985-02-15. International Organisation for Standardisation, Technical Committee ISO/TC 46
ISO/IEC (1993) ISO/IEC International Standard 10646-1:1993(E): Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and basic multilingual plane. International Organization for Standardization, Geneva 1993
Kando N (2001). Overview of 2nd NTCIR workshop. In: Proc. 2nd NTCIR workshop on research in Chinese and Japanese text retrieval and text summarization, Tokyo, May 2000–March 2001. NII, Tokyo. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings2/overview-kando.pdf
Kando N, Kuriyama K, Nozue T, Eguchi K, Kato H, Hidaka S, Adachi J (1999) The NTCIR workshop: the first evaluation workshop on Japanese text retrieval and cross-lingual information retrieval. In: Proc. 4th international workshop on information retrieval with Asian languages (IRAL’99), Nov. 11-12, 1999, Taipei, Taiwan
Kando N, Mitamura T, Sakai T (2008) Introduction to the NTCIR-6 special issue. ACM Trans. Asian Lang. Inform. Process. 7(2): 1–3
Oard DW (ed.) (2003) The surprise language exercises. ACM Trans. Asian Lang. Proc. 2(3-4): 79–84
Oard DW, Gey FC (2003) The TREC-2002 Arabic-English CLIR track. In: The eleventh text retrieval conference. TREC 2002. NIST special publication 500-251: 17–26
Peters C (ed.) (2001) Cross-language information retrieval and evaluation. 1st workshop of cross-language evaluation forum, CLEF 2000. Springer LNCS 2069
Salton G (1971) Automatic processing of foreign language documents. Prentice-Hill: Englewood Cliffs, NJ
Schäuble P, Smeaton A (1998) An international research agenda for digital libraries: Summary report of the series of joint NSF-EU working groups on future directions for digital libraries research, 1998. http://www.ercim.eu/publication/ws-proceedings/DELOS-B/dl_sum_report.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Peters, C., Braschler, M., Clough, P. (2012). Introduction. In: Multilingual Information Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23008-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-23008-0_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23007-3
Online ISBN: 978-3-642-23008-0
eBook Packages: Computer ScienceComputer Science (R0)