Abstract
Current storage and processing facilities have caused the emergence of many multimedia repositories and, consequently, they have also triggered the necessity of new approaches for information retrieval. In particular, spoken document retrieval is a very complex task since existing speech recognition systems tend to generate several transcription errors (such as word substitutions, insertions and deletions). In order to deal with these errors, this paper proposes an enriched document representation based on a phonetic codification of the automatic transcriptions. This representation aims to reduce the impact of the transcription errors by representing words with similar pronunciations through the same phonetic code. Experimental results on the CL-SR corpus from the CLEF 2007 (which includes 33 test topics and 8,104 English interviews) are encouraging; our method achieved a mean average precision of 0.0795, outperforming all except one of the evaluated systems at this forum.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J.: Perspectives on Information Retrieval and Speech. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds.) SIGIR-WS 2001. LNCS, vol. 2273, Springer, Heidelberg (2002)
Jones, G., Zhang, K., Lam-Adesina, A.: Dublin City University at CLEF 2007: Cross-Language Speech Retrieval (CL-SR) Experiments. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Alzghool, M., Inkpek, D.: Model Fusion for the Cross Language Speech Retrieval Task at CLEF 2007. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Lease, M., Charniak, E.: Brown at CL-SR 2007: Retrieval Conversational Speech in English and Czech. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Levow, G.: University of Chicago at the CLEF 2007 Cross-Language Speech Retrieval Track. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Odell, M.K., Russell, R.C.: U.S. Patent Numbers 1261167 (1918) and 1435663 (1922). Washington, D.C.: U.S. Patent Office (1918)
Raghavan, H., Allan, J.: Using Soundex Codes for Indexing Names in ASR documents. In: Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at Humal Language Technology Conference and North American chapter of Association of Computa Computational Linguistics, Boston, MA, USA, pp. 22–27 (2004)
Voorhees, E., Garofolo, J., Jones, K.: The TREC-6 Spoken Document Retrieval Track. In: Proceedings of the Sixth Text Retrieval Conference (TREC-6), Gaithersburg, Maryland, November 19–21 (1997)
Garafolo, J.S., Auzanne, C.G.P., Voorhees, E.: The TREC Spoken Document Retrieval Track: A Success Story. In: Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, Paris, France (2000)
White, R., Oard, D., Jones, G., Soergel, D., Huang, X.: Overview of the CLEF-2005 Cross-Language Speech Retrieval Track. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 744–759. Springer, Heidelberg (2006)
Pecina, P., Hoffmannová, P.: Overview of the CLEF-2007 Cross-Language Speech Retrieval Track. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Huurnink, B.: The University of Amsterdam at the CLEF Cross Language Speech Retrieval Track 2007. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A Language-Model based Search Engine for Complex Queries. In: Proceedings of the International Conference on Intelligence Analysis, McLean, VA, May 2-6 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reyes-Barragán, M.A., Villaseñor-Pineda, L., Montes-y-Gómez, M. (2008). A Soundex-Based Approach for Spoken Document Retrieval. In: Gelbukh, A., Morales, E.F. (eds) MICAI 2008: Advances in Artificial Intelligence. MICAI 2008. Lecture Notes in Computer Science(), vol 5317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88636-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-88636-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88635-8
Online ISBN: 978-3-540-88636-5
eBook Packages: Computer ScienceComputer Science (R0)