Skip to main content

CLEF 2002 Cross-Language Spoken Document Retrieval Pilot Track Report

  • Conference paper
Advances in Cross-Language Information Retrieval (CLEF 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2785))

Included in the following conference series:

Abstract

The current expansion in collections of natural language based digital documents in various media and languages is creating challenging opportunities for automatically accessing the information contained in these documents. This paper describes the CLEF 2002 pilot track investigation of Cross-Language Spoken Document Retrieval (CLSDR) combining information retrieval, cross-language translation and speech recognition. The experimental investigation is based on the TREC-8 and TREC-9 SDR evaluation tasks, augmented to form a CLSDR task. The original task of retrieving English language spoken documents using English request topics is compared with cross-language retrieval using French, German, Italian and Spanish topic translations. The results of the pilot track establish baseline performance levels and indicate that pseudo relevance feedback and contemporaneous text document collections can be used to improve CLSDR performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. S. Garafolo, C.G.P. Auzanne, and E. M. Voorhees. The TREC Spoken Document Retrieval Track: A Success Story. In Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, pages 1-20, Paris, 2000. 446, 447, 451, 452, 453

    Google Scholar 

  2. K. Taghva, J. Borsack, and A. Condit. Results of applying probabilistic IR to OCR text. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 202-211, Dublin, 1994. ACM. 446

    Google Scholar 

  3. P.B. Kantor and E. M. Voorhees. The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text. Information Retrieval, 2:165–176, 2000. 446

    Article  Google Scholar 

  4. A.F. Smeaton, P. Over, and R. Taban. The TREC-2001 Video Track Report. In Proceedings of the Tenth Text REtrieval Conference (TREC-2001), pages 52-60, Gaithersburg, MD, 2002. NIST. 446

    Google Scholar 

  5. Carol Peters el al., editor. Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, September 2001. Springer. 447

    Google Scholar 

  6. R. C. Rose. Techniques for information retrieval from speech messages. Lincoln Laboratory Journal, 4(1):45–60, 1991. 447, 448

    Google Scholar 

  7. U. Glavitsch and P. Schäuble. A System for Retrieving Speech Documents. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 168-176. ACM, 1992. 447, 448

    Google Scholar 

  8. D. A. James. The Application of Classical Information Retrieval Techniques to Spoken Documents. PhD thesis, Cambridge University, February 1995. 447, 448

    Google Scholar 

  9. G. J. F. Jones, J. T. Foote, K. Sparck Jones, and S. J. Young. Retrieving Spoken Documents by Combining Multiple Index Sources. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30-38, Zürich, August 1996. ACM. 447, 448

    Google Scholar 

  10. M. Wechsler, E. Munteanu, and P. Schauble. New Techniques for Open-Vocabulary Spoken Document Retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 20-27, Melbourne, 1998. ACM. 448

    Google Scholar 

  11. K. Ng and V. Zue. Phonetic Recognition for Spoken Document Retrieval. In Proceedings of ICASSP 98, volume I, pages 325-328, Seattle, WA, May 1998. IEEE. 448

    Google Scholar 

  12. S. E. Johnson, P. Jourlin, K. Sparck Jones, and P. C. Woodland. Spoken Document Retrieval for TREC-8 at Cambridge University. In D. K. Harman and E. M. Voorhees, editors, Proceedings of the Eighth Text REtrieval Conference (TREC-8), pages 157-168, Gaithersburg, MD, 2000. NIST. 449, 454

    Google Scholar 

  13. A. Singhal and F. Pereira. Document Expansion for Speech Retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, San Francisco, 1999. ACM. 449

    Google Scholar 

  14. P. Sheridan, M. Wechsler, and P. Schäuble. Cross-Language Speech Retrieval: Establishing a Baseline Performance,. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 99-108, Philadelphia, 1997. ACM. 450

    Google Scholar 

  15. G. J. F. Jones. Applying Machine Translation Resources for Cross-Language Information Access from Spoken Documents. In Proceedings of the MT2000: Machine Translation and Multilingual Applications in the New Millennium, pages 4-(1-9), Exeter, 2000. 450

    Google Scholar 

  16. D. Graff, C. Cieri, S. Strassel, and N. Martey. Linguistic Data Consortium the TDT-3 Text and Speech corpus. In Proceedings of the Topic Detection and Tracking (TDT) Workshop, Vienna, Virginia, USA, 1999. NIST. 450

    Google Scholar 

  17. C. Auzanne, J.S. Garafolo, J. G. Fiscus, and W. M. Fisher. Automatic Language Model Adaptation for Spoken Document Retrieval. In Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, pages 1-20, Paris, 2000. 452

    Google Scholar 

  18. D. Dimmick, G. O’Brien, P. Over, and W. Rogers. Guide to Z39.50/Prise 2.0: Its Installation, Use, & Modification. http://www-nlpir.nist.gov/works/papers/zp2/zp2.html, 1998. 452

  19. S. E. Johnson, P. Jourlin, K. Sparck Jones, and P. C. Woodland. Spoken Document Retrieval for TREC-9 at Cambridge University. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Ninth Text REtrieval Conference (TREC-9). NIST, 2001. 453

    Google Scholar 

  20. N. Bertoldi and M. Federico. Cross-Language Spoken Document Retrieval on the TREC SDR Collection. In Proceedings of the CLEF 2002: Workshop on Cross-Language Information Retrieval and Evaluation, Rome, September 2002. Springer Verlag. 454

    Google Scholar 

  21. M. Federico and N. Bertoldi. Statistical Cross-Language Information Retrieval using N-Best Query Translation. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 167-174, Tampere, 2002. ACM. 454

    Google Scholar 

  22. G. J. F. Jones and A. M. Lam-Adesina. Exeter at CLEF 2002: Cross-Language Spoken Document Retrieval Experiments. In Proceedings of the CLEF 2002: Workshop on Cross-Language Information Retrieval and Evaluation, Rome, September 2002. Springer Verlag. 454

    Google Scholar 

  23. A.M. Lam-Adesina and G. J.F. Jones. Applying Summarization Techniques for Term Selection in Relevance Feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1-9, New Orleans, 2001. ACM. 454

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jones, G.J.F., Federico, M. (2003). CLEF 2002 Cross-Language Spoken Document Retrieval Pilot Track Report. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45237-9_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40830-7

  • Online ISBN: 978-3-540-45237-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics