Abstract
The current expansion in collections of natural language based digital documents in various media and languages is creating challenging opportunities for automatically accessing the information contained in these documents. This paper describes the CLEF 2002 pilot track investigation of Cross-Language Spoken Document Retrieval (CLSDR) combining information retrieval, cross-language translation and speech recognition. The experimental investigation is based on the TREC-8 and TREC-9 SDR evaluation tasks, augmented to form a CLSDR task. The original task of retrieving English language spoken documents using English request topics is compared with cross-language retrieval using French, German, Italian and Spanish topic translations. The results of the pilot track establish baseline performance levels and indicate that pseudo relevance feedback and contemporaneous text document collections can be used to improve CLSDR performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. S. Garafolo, C.G.P. Auzanne, and E. M. Voorhees. The TREC Spoken Document Retrieval Track: A Success Story. In Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, pages 1-20, Paris, 2000. 446, 447, 451, 452, 453
K. Taghva, J. Borsack, and A. Condit. Results of applying probabilistic IR to OCR text. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 202-211, Dublin, 1994. ACM. 446
P.B. Kantor and E. M. Voorhees. The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text. Information Retrieval, 2:165–176, 2000. 446
A.F. Smeaton, P. Over, and R. Taban. The TREC-2001 Video Track Report. In Proceedings of the Tenth Text REtrieval Conference (TREC-2001), pages 52-60, Gaithersburg, MD, 2002. NIST. 446
Carol Peters el al., editor. Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, September 2001. Springer. 447
R. C. Rose. Techniques for information retrieval from speech messages. Lincoln Laboratory Journal, 4(1):45–60, 1991. 447, 448
U. Glavitsch and P. Schäuble. A System for Retrieving Speech Documents. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 168-176. ACM, 1992. 447, 448
D. A. James. The Application of Classical Information Retrieval Techniques to Spoken Documents. PhD thesis, Cambridge University, February 1995. 447, 448
G. J. F. Jones, J. T. Foote, K. Sparck Jones, and S. J. Young. Retrieving Spoken Documents by Combining Multiple Index Sources. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30-38, Zürich, August 1996. ACM. 447, 448
M. Wechsler, E. Munteanu, and P. Schauble. New Techniques for Open-Vocabulary Spoken Document Retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 20-27, Melbourne, 1998. ACM. 448
K. Ng and V. Zue. Phonetic Recognition for Spoken Document Retrieval. In Proceedings of ICASSP 98, volume I, pages 325-328, Seattle, WA, May 1998. IEEE. 448
S. E. Johnson, P. Jourlin, K. Sparck Jones, and P. C. Woodland. Spoken Document Retrieval for TREC-8 at Cambridge University. In D. K. Harman and E. M. Voorhees, editors, Proceedings of the Eighth Text REtrieval Conference (TREC-8), pages 157-168, Gaithersburg, MD, 2000. NIST. 449, 454
A. Singhal and F. Pereira. Document Expansion for Speech Retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, San Francisco, 1999. ACM. 449
P. Sheridan, M. Wechsler, and P. Schäuble. Cross-Language Speech Retrieval: Establishing a Baseline Performance,. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 99-108, Philadelphia, 1997. ACM. 450
G. J. F. Jones. Applying Machine Translation Resources for Cross-Language Information Access from Spoken Documents. In Proceedings of the MT2000: Machine Translation and Multilingual Applications in the New Millennium, pages 4-(1-9), Exeter, 2000. 450
D. Graff, C. Cieri, S. Strassel, and N. Martey. Linguistic Data Consortium the TDT-3 Text and Speech corpus. In Proceedings of the Topic Detection and Tracking (TDT) Workshop, Vienna, Virginia, USA, 1999. NIST. 450
C. Auzanne, J.S. Garafolo, J. G. Fiscus, and W. M. Fisher. Automatic Language Model Adaptation for Spoken Document Retrieval. In Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, pages 1-20, Paris, 2000. 452
D. Dimmick, G. O’Brien, P. Over, and W. Rogers. Guide to Z39.50/Prise 2.0: Its Installation, Use, & Modification. http://www-nlpir.nist.gov/works/papers/zp2/zp2.html, 1998. 452
S. E. Johnson, P. Jourlin, K. Sparck Jones, and P. C. Woodland. Spoken Document Retrieval for TREC-9 at Cambridge University. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Ninth Text REtrieval Conference (TREC-9). NIST, 2001. 453
N. Bertoldi and M. Federico. Cross-Language Spoken Document Retrieval on the TREC SDR Collection. In Proceedings of the CLEF 2002: Workshop on Cross-Language Information Retrieval and Evaluation, Rome, September 2002. Springer Verlag. 454
M. Federico and N. Bertoldi. Statistical Cross-Language Information Retrieval using N-Best Query Translation. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 167-174, Tampere, 2002. ACM. 454
G. J. F. Jones and A. M. Lam-Adesina. Exeter at CLEF 2002: Cross-Language Spoken Document Retrieval Experiments. In Proceedings of the CLEF 2002: Workshop on Cross-Language Information Retrieval and Evaluation, Rome, September 2002. Springer Verlag. 454
A.M. Lam-Adesina and G. J.F. Jones. Applying Summarization Techniques for Term Selection in Relevance Feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1-9, New Orleans, 2001. ACM. 454
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jones, G.J.F., Federico, M. (2003). CLEF 2002 Cross-Language Spoken Document Retrieval Pilot Track Report. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-45237-9_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40830-7
Online ISBN: 978-3-540-45237-9
eBook Packages: Springer Book Archive