Skip to main content

A Soundex-Based Approach for Spoken Document Retrieval

  • Conference paper
MICAI 2008: Advances in Artificial Intelligence (MICAI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5317))

Included in the following conference series:

Abstract

Current storage and processing facilities have caused the emergence of many multimedia repositories and, consequently, they have also triggered the necessity of new approaches for information retrieval. In particular, spoken document retrieval is a very complex task since existing speech recognition systems tend to generate several transcription errors (such as word substitutions, insertions and deletions). In order to deal with these errors, this paper proposes an enriched document representation based on a phonetic codification of the automatic transcriptions. This representation aims to reduce the impact of the transcription errors by representing words with similar pronunciations through the same phonetic code. Experimental results on the CL-SR corpus from the CLEF 2007 (which includes 33 test topics and 8,104 English interviews) are encouraging; our method achieved a mean average precision of 0.0795, outperforming all except one of the evaluated systems at this forum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J.: Perspectives on Information Retrieval and Speech. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds.) SIGIR-WS 2001. LNCS, vol. 2273, Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Jones, G., Zhang, K., Lam-Adesina, A.: Dublin City University at CLEF 2007: Cross-Language Speech Retrieval (CL-SR) Experiments. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)

    Google Scholar 

  3. Alzghool, M., Inkpek, D.: Model Fusion for the Cross Language Speech Retrieval Task at CLEF 2007. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)

    Google Scholar 

  4. Lease, M., Charniak, E.: Brown at CL-SR 2007: Retrieval Conversational Speech in English and Czech. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)

    Google Scholar 

  5. Levow, G.: University of Chicago at the CLEF 2007 Cross-Language Speech Retrieval Track. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)

    Google Scholar 

  6. Odell, M.K., Russell, R.C.: U.S. Patent Numbers 1261167 (1918) and 1435663 (1922). Washington, D.C.: U.S. Patent Office (1918)

    Google Scholar 

  7. Raghavan, H., Allan, J.: Using Soundex Codes for Indexing Names in ASR documents. In: Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at Humal Language Technology Conference and North American chapter of Association of Computa Computational Linguistics, Boston, MA, USA, pp. 22–27 (2004)

    Google Scholar 

  8. Voorhees, E., Garofolo, J., Jones, K.: The TREC-6 Spoken Document Retrieval Track. In: Proceedings of the Sixth Text Retrieval Conference (TREC-6), Gaithersburg, Maryland, November 19–21 (1997)

    Google Scholar 

  9. Garafolo, J.S., Auzanne, C.G.P., Voorhees, E.: The TREC Spoken Document Retrieval Track: A Success Story. In: Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, Paris, France (2000)

    Google Scholar 

  10. White, R., Oard, D., Jones, G., Soergel, D., Huang, X.: Overview of the CLEF-2005 Cross-Language Speech Retrieval Track. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 744–759. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Pecina, P., Hoffmannová, P.: Overview of the CLEF-2007 Cross-Language Speech Retrieval Track. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)

    Google Scholar 

  12. Huurnink, B.: The University of Amsterdam at the CLEF Cross Language Speech Retrieval Track 2007. Working Notes of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007), Budapest, Hungary, September 19-21 (2007)

    Google Scholar 

  13. Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A Language-Model based Search Engine for Complex Queries. In: Proceedings of the International Conference on Intelligence Analysis, McLean, VA, May 2-6 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reyes-Barragán, M.A., Villaseñor-Pineda, L., Montes-y-Gómez, M. (2008). A Soundex-Based Approach for Spoken Document Retrieval. In: Gelbukh, A., Morales, E.F. (eds) MICAI 2008: Advances in Artificial Intelligence. MICAI 2008. Lecture Notes in Computer Science(), vol 5317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88636-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88636-5_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88635-8

  • Online ISBN: 978-3-540-88636-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics