Using Semantic and Phonetic Term Similarity for Spoken Document Retrieval and Spoken Query Processing

  • Fabio Crestani
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 89)


In classical Information Retrieval systems a relevant document will not be retrieved in response to a query if the document and query representations do not share at least one term. This problem is known as “term mismatch” . A similar prob­lem can be found in spoken document retrieval and spoken query processing, where terms misrecognized by the speech recognition process can hinder the retrieval of potentially relevant documents. I will call this problem “term misrecognition”, by analogy to the term mismatch problem.

This paper presents two classes of retrieval models that attempt to tackle both the term mismatch and the term misrecognition problems at retrieval time using term similarity information. The models make effective use of complete or partial knowledge of semantic and phonetic term similarity evaluated using statistical methods for the corpus.


Information Retrieval Semantic Similarity Retrieval Model Query Term Indexing Weight 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    K.W. Church and P. Hanks. Word association norms, mutual information and lexicography. In Proceedings of ACL 27, pages 76–83, Vancouver, Canada, 1989.Google Scholar
  2. 2.
    F. Crestani. Vocal access to a newspaper archive: design issues and preliminary investigation. In Proceedings of ACM Digital Libraries, pages 59–68, Berkeley, CA, USA, August 1999.Google Scholar
  3. 3.
    F. Crestani. Combination of semantic and phonetic term similarity for spoken document retrieval and spoken query processing. In Proceedings of the 8th Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), pages 960–967, Madrid, Spain, July 2000.Google Scholar
  4. 4.
    F. Crestani. Effects of word recogntion errors in spoken query processing. In Proceedings of the IEEE ADL 2000 Conference, pages 39–47, Washington DC, USA, May 2000.Google Scholar
  5. 5.
    F. Crestani. Exploiting the similarity of non-matching terms at retrieval time. Journal of Information Retrieval, 2 (1): 23–43, 2000.CrossRefGoogle Scholar
  6. 6.
    S. Deerwester, S.T. Dumais, G.W. Fumas, T. Landauer, and Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (6): 391–407, 1990.CrossRefGoogle Scholar
  7. 7.
    J. Heid. Getting started with optical character recognition. Mac World, pages 77–83, October 1990.Google Scholar
  8. 8.
    J.A. Markowitz. Using speech recognition. Prentice Hall, Upper Saddle River, NJ, USA, 1996. 375Google Scholar
  9. 9.
    K. Ng. Towards robust methods for spoken document retrieval. In Proceedings of Int. Conf. on Spoken Language Processing,volume 3, pages 939–942, Sydney, Australia, November 1998.Google Scholar
  10. 10.
    J.Y. Nie. An outline of a general model for Information Retrieval. In Proceedings of ACM SIGIR, pages 495–506, Grenoble, France, June 1988.Google Scholar
  11. 11.
    G. Salton. Automatic information organization and retrieval. McGraw Hill, New York, 1968.Google Scholar
  12. 12.
    A.F. Smeaton. Progress in the application of Natural Language Processing to Information Retrieval tasks. The Computer Journal, 35 (3): 268–278, 1992.CrossRefGoogle Scholar
  13. 13.
    S. Srinivasan and D. Petkovic. Phonetic confusion matrix based spoken document retrieval. In Proceedings of ACM SIGIR, pages 81–87, Athens, Greece, July 2000.Google Scholar
  14. 14.
    C.J. van Rijsbergen. Information Retrieval. Butterworths, London, UK, second edition, 1979.Google Scholar
  15. 15.
    S.K.M. Wong and Y.Y. Yao. On modelling Information Retrieval with probabilistic inference. ACM Transactions on Information Systems, 13 (1): 38–68, 1995.MathSciNetCrossRefGoogle Scholar
  16. 16.
    S.K.M. Wong, W. Ziarko, V.V. Raghavan, and P.C.N. Wong. On modelling of information retrieval concepts in vector spaces. ACM Transactions on Information Systems, 12 (2): 299–321, 1987.Google Scholar
  17. 17.
    J. Xu. Solving the word mismatch problem through automatic text analysis. Ph.D. Thesis, Department of Computer Science, University of Massachusetts, Amherst, MA, USA, May 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Fabio Crestani
    • 1
  1. 1.Department of Computer ScienceUniversity of StrathclydeGlasgowScotland, UK

Personalised recommendations