Skip to main content

Introduction

  • Chapter
  • First Online:
  • 1390 Accesses

Abstract

Multilingual information access and retrieval is a specific area of the academic domain of information access and retrieval; the main focus is the development of systems for information discovery in multiple languages, both monolingually and across languages. There is both a social and an economic need for such systems and there is ample evidence that this need will grow substantially over the coming years. In this introduction, we describe the range and intentions of research and development in this area from its recognition as an independent discipline in the mid-1990s to the challenges that it is now facing today.

Douglas W. Oard and David Hull, AAAI Symposium on Cross-Language IR, Spring 1997, Stanford, USA

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In this period, an increasing proportion of new users coming online were individuals and small businesses chiefly interested in using the Internet for local communication. In non-English speaking countries, large firms or public institutions may have an incentive to also post their web pages in English, but a small local business does not. As more people in a language community come online, content and service providers have a strong interest in accommodating them in their own language.

  2. 2.

    In 2009 at the Gartner Symposium, Orlando, Eric Schmidt, CEO of Google, predicted that within 5 years the Internet will be dominated by Chinese-language content.

  3. 3.

    Other terms that have been used are Translingual and Cross-Lingual IR. ‘Translingual’ was made popular for a short period by the TIDES project in the US but now seems to have fallen into disuse; ‘cross-lingual’ can still be found but ‘cross-language’ is generally the preferred choice.

  4. 4.

    See the Unicode web page http://www.unicode.org/ for Unicode standards and updates.

  5. 5.

    Internationalisation and localisation are discussed in the section on implementing multilingual user interfaces in Chapter 4.

  6. 6.

    Most of these initiatives have been funded by the Directorate for Digital Content and Cognitive Systems and the Language Technologies programmes.

  7. 7.

    http://tdil.mit.gov.in/

  8. 8.

    Two of these projects which have had considerable impact and are cited several times in this book are the Clarity and the MultiMatch projects. The objective of Clarity was to develop general purpose CLIR techniques which would work with minimal translation resources; MultiMatch aimed at providing personalised access to cultural heritage information over both language and media boundaries.

  9. 9.

    See DARPA policy statement at http://www.darpa.mil/darpatech99/Presentations/scripts/ito/ITOTIDESScript.txt

  10. 10.

    http://theeuropeanlibrary.org/

  11. 11.

    http://www.europeana.eu/

  12. 12.

    Over 15 million at the beginning of 2011.

  13. 13.

    The actual name was ‘Workshop on Cross-Linguistic Information Retrieval’, however discussing terminology for this new sector of IR the participants felt that ‘cross-language’ was a more appropriate term.

  14. 14.

    The creation of test collections for (ML)IR is described in detail in Chapter 5.

  15. 15.

    http://trec.nist.gov/

  16. 16.

    http://www.clef-campaign.org/

  17. 17.

    http://research.nii.ac.jp/ntcir/

  18. 18.

    http://www.isical.ac.in/~clia/index.html

  19. 19.

    TopTenReviews is a website which aggregates reviews for software, hardware, and web services, from other sites and publications, see http://translation-software-review.toptenreviews.com/

  20. 20.

    http://babelfish.yahoo.com/

  21. 21.

    IDC is a global provider of market intelligence, advisory services, and events for the information technology, telecommunications, and consumer technology markets, see http://www.idc.com/

  22. 22.

    CEO of Clairvoyance Corporation.

  23. 23.

    IDC Predictions 2009.

  24. 24.

    Think of an English tourist visiting south-east Asia and interested in traditional music and dance. An initial query in English finds preliminary information on dances in Cambodia, Vietnam and Laos. Some of the documents returned have pictures and music associated. The tourist uses these to find similar images and music and also reformulates the query in CLIR mode, specifying that they are interested in target documents in these three languages. The documents returned are no longer in English but are in the national languages accompanied by an MT gist in English.

  25. 25.

    There are approximately 6,800 known languages in the world.

  26. 26.

    If this problem is ever to be overcome, it implies a rethinking of the current mechanisms for CLIR and increased study of language-independent or conceptual mapping systems.

References

  • Ballestreros L, Croft WB (1998) Resolving ambiguity for cross-language retrieval. In: Proc. 21st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1998). ACM Press: 64–71

    Google Scholar 

  • EMIR (1994) Final report of the EMIR project number 5312. Commission of the European Union, Brussels

    Google Scholar 

  • Ferro N, Peters C (2008) From CLEF to TrebleCLEF: the evolution of the cross-language evaluation forum. In: Proc. NTCIR-7 Workshop Meeting, December 16–19 2008, NII, Tokyo. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings7/

  • FIRE (2008) First workshop of the forum for information retrieval evaluation. http://www.isical.ac.in/~fire/2008/working_notes.html

  • FIRE (2010) Working notes FIRE 2010, 19-21 February 2010, DAIICT, Gandhinagar. http://www.isical.ac.in/~fire/2010/working_notes.html

  • Gey FC, Kando N, Peters C (2005) Cross-language information retrieval: the way ahead. J. Inf. Process. & Manag. 41(3): 415–431

    Article  Google Scholar 

  • Gey FC, Kando N, Lin C-Y, Peters C (2006) New directions in multilingual information access. SIGIR 2006 workshop report. ACM SIGIR Forum 40(2): 31–39

    Google Scholar 

  • Gey FC, Kando N, Karlgren J (2009) Information access in a multilingual world: Transitioning from research to real-world applications. ACM SIGIR Forum 43(2): 24–28

    Article  Google Scholar 

  • Grefenstette G. (ed.) (1998) Cross-language information retrieval. The Kluwer International Series on Information Retrieval, Kluwer Academic Publishers, Boston

    Google Scholar 

  • Harman D (2003) The development and evolution of TREC and DUC. In Proc. 3rd NTCIR workshop on research in information retrieval, question answering, and summarization. NII, Tokyo. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/

  • Harman D, Braschler M, Hess M, Kluck M, Peters C, Schäuble P, Sheridan P (2001) CLIR evaluation in TREC. In Peters C (ed.) op.cit.: 7–23

    Google Scholar 

  • Harman D, Kando N, Majumder P, Mitra M, Peters C (eds.) (2010) Special issue on Indian language information retrieval. ACM Trans. Asian Lang. Inform. Process. 9(3)

    Google Scholar 

  • Hovy E, Ide N, Frederkin R (eds.) (1999) Multilingual information management: current levels and future abilities, NSF/EC/DARPA, http://www.cs.cmu.edu/~ref/ mlim/index.html

  • ISO (1985) ISO Standard 5964-1985: Guidelines for the establishment and development of multilingual thesauri. First edition 1985-02-15. International Organisation for Standardisation, Technical Committee ISO/TC 46

    Google Scholar 

  • ISO/IEC (1993) ISO/IEC International Standard 10646-1:1993(E): Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and basic multilingual plane. International Organization for Standardization, Geneva 1993

    Google Scholar 

  • Kando N (2001). Overview of 2nd NTCIR workshop. In: Proc. 2nd NTCIR workshop on research in Chinese and Japanese text retrieval and text summarization, Tokyo, May 2000–March 2001. NII, Tokyo. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings2/overview-kando.pdf

  • Kando N, Kuriyama K, Nozue T, Eguchi K, Kato H, Hidaka S, Adachi J (1999) The NTCIR workshop: the first evaluation workshop on Japanese text retrieval and cross-lingual information retrieval. In: Proc. 4th international workshop on information retrieval with Asian languages (IRAL’99), Nov. 11-12, 1999, Taipei, Taiwan

    Google Scholar 

  • Kando N, Mitamura T, Sakai T (2008) Introduction to the NTCIR-6 special issue. ACM Trans. Asian Lang. Inform. Process. 7(2): 1–3

    Article  Google Scholar 

  • Oard DW (ed.) (2003) The surprise language exercises. ACM Trans. Asian Lang. Proc. 2(3-4): 79–84

    Google Scholar 

  • Oard DW, Gey FC (2003) The TREC-2002 Arabic-English CLIR track. In: The eleventh text retrieval conference. TREC 2002. NIST special publication 500-251: 17–26

    Google Scholar 

  • Peters C (ed.) (2001) Cross-language information retrieval and evaluation. 1st workshop of cross-language evaluation forum, CLEF 2000. Springer LNCS 2069

    Google Scholar 

  • Salton G (1971) Automatic processing of foreign language documents. Prentice-Hill: Englewood Cliffs, NJ

    Google Scholar 

  • Schäuble P, Smeaton A (1998) An international research agenda for digital libraries: Summary report of the series of joint NSF-EU working groups on future directions for digital libraries research, 1998. http://www.ercim.eu/publication/ws-proceedings/DELOS-B/dl_sum_report.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carol Peters .

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Peters, C., Braschler, M., Clough, P. (2012). Introduction. In: Multilingual Information Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23008-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23008-0_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23007-3

  • Online ISBN: 978-3-642-23008-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics