A Soundex-Based Approach for Spoken Document Retrieval

Reyes-Barragán, M. Alejandro; Villaseñor-Pineda, Luis; Montes-y-Gómez, Manuel

doi:10.1007/978-3-540-88636-5_19

M. Alejandro Reyes-Barragán³,
Luis Villaseñor-Pineda³ &
Manuel Montes-y-Gómez³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5317))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

2043 Accesses
2 Citations

Abstract

Current storage and processing facilities have caused the emergence of many multimedia repositories and, consequently, they have also triggered the necessity of new approaches for information retrieval. In particular, spoken document retrieval is a very complex task since existing speech recognition systems tend to generate several transcription errors (such as word substitutions, insertions and deletions). In order to deal with these errors, this paper proposes an enriched document representation based on a phonetic codification of the automatic transcriptions. This representation aims to reduce the impact of the transcription errors by representing words with similar pronunciations through the same phonetic code. Experimental results on the CL-SR corpus from the CLEF 2007 (which includes 33 test topics and 8,104 English interviews) are encouraging; our method achieved a mean average precision of 0.0795, outperforming all except one of the evaluated systems at this forum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allan, J.: Perspectives on Information Retrieval and Speech. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds.) SIGIR-WS 2001. LNCS, vol. 2273, Springer, Heidelberg (2002)
Chapter Google Scholar
Jones, G., Zhang, K., Lam-Adesina, A.: Dublin City University at CLEF 2007: Cross-Language Speech Retrieval (CL-SR) Experiments. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Google Scholar
Alzghool, M., Inkpek, D.: Model Fusion for the Cross Language Speech Retrieval Task at CLEF 2007. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Google Scholar
Lease, M., Charniak, E.: Brown at CL-SR 2007: Retrieval Conversational Speech in English and Czech. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Google Scholar
Levow, G.: University of Chicago at the CLEF 2007 Cross-Language Speech Retrieval Track. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Google Scholar
Odell, M.K., Russell, R.C.: U.S. Patent Numbers 1261167 (1918) and 1435663 (1922). Washington, D.C.: U.S. Patent Office (1918)
Google Scholar
Raghavan, H., Allan, J.: Using Soundex Codes for Indexing Names in ASR documents. In: Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at Humal Language Technology Conference and North American chapter of Association of Computa Computational Linguistics, Boston, MA, USA, pp. 22–27 (2004)
Google Scholar
Voorhees, E., Garofolo, J., Jones, K.: The TREC-6 Spoken Document Retrieval Track. In: Proceedings of the Sixth Text Retrieval Conference (TREC-6), Gaithersburg, Maryland, November 19–21 (1997)
Google Scholar
Garafolo, J.S., Auzanne, C.G.P., Voorhees, E.: The TREC Spoken Document Retrieval Track: A Success Story. In: Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, Paris, France (2000)
Google Scholar
White, R., Oard, D., Jones, G., Soergel, D., Huang, X.: Overview of the CLEF-2005 Cross-Language Speech Retrieval Track. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 744–759. Springer, Heidelberg (2006)
Chapter Google Scholar
Pecina, P., Hoffmannová, P.: Overview of the CLEF-2007 Cross-Language Speech Retrieval Track. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Google Scholar
Huurnink, B.: The University of Amsterdam at the CLEF Cross Language Speech Retrieval Track 2007. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)
Google Scholar
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A Language-Model based Search Engine for Complex Queries. In: Proceedings of the International Conference on Intelligence Analysis, McLean, VA, May 2-6 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratorio de Tecnologías del Lenguaje, Instituto Nacional de Astrofísica, Óptica y Electrónica, México
M. Alejandro Reyes-Barragán, Luis Villaseñor-Pineda & Manuel Montes-y-Gómez

Authors

M. Alejandro Reyes-Barragán
View author publications
You can also search for this author in PubMed Google Scholar
Luis Villaseñor-Pineda
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Montes-y-Gómez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, México
Alexander Gelbukh
Ciencias Computacionales, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro #1 , Sta. María Tonantzintla, 72840, Puebla, México
Eduardo F. Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reyes-Barragán, M.A., Villaseñor-Pineda, L., Montes-y-Gómez, M. (2008). A Soundex-Based Approach for Spoken Document Retrieval. In: Gelbukh, A., Morales, E.F. (eds) MICAI 2008: Advances in Artificial Intelligence. MICAI 2008. Lecture Notes in Computer Science(), vol 5317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88636-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-88636-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88635-8
Online ISBN: 978-3-540-88636-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics