Using Semantic and Phonetic Term Similarity for Spoken Document Retrieval and Spoken Query Processing
In classical Information Retrieval systems a relevant document will not be retrieved in response to a query if the document and query representations do not share at least one term. This problem is known as “term mismatch” . A similar problem can be found in spoken document retrieval and spoken query processing, where terms misrecognized by the speech recognition process can hinder the retrieval of potentially relevant documents. I will call this problem “term misrecognition”, by analogy to the term mismatch problem.
This paper presents two classes of retrieval models that attempt to tackle both the term mismatch and the term misrecognition problems at retrieval time using term similarity information. The models make effective use of complete or partial knowledge of semantic and phonetic term similarity evaluated using statistical methods for the corpus.
KeywordsInformation Retrieval Semantic Similarity Retrieval Model Query Term Indexing Weight
Unable to display preview. Download preview PDF.
- 1.K.W. Church and P. Hanks. Word association norms, mutual information and lexicography. In Proceedings of ACL 27, pages 76–83, Vancouver, Canada, 1989.Google Scholar
- 2.F. Crestani. Vocal access to a newspaper archive: design issues and preliminary investigation. In Proceedings of ACM Digital Libraries, pages 59–68, Berkeley, CA, USA, August 1999.Google Scholar
- 3.F. Crestani. Combination of semantic and phonetic term similarity for spoken document retrieval and spoken query processing. In Proceedings of the 8th Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), pages 960–967, Madrid, Spain, July 2000.Google Scholar
- 4.F. Crestani. Effects of word recogntion errors in spoken query processing. In Proceedings of the IEEE ADL 2000 Conference, pages 39–47, Washington DC, USA, May 2000.Google Scholar
- 7.J. Heid. Getting started with optical character recognition. Mac World, pages 77–83, October 1990.Google Scholar
- 8.J.A. Markowitz. Using speech recognition. Prentice Hall, Upper Saddle River, NJ, USA, 1996. 375Google Scholar
- 9.K. Ng. Towards robust methods for spoken document retrieval. In Proceedings of Int. Conf. on Spoken Language Processing,volume 3, pages 939–942, Sydney, Australia, November 1998.Google Scholar
- 10.J.Y. Nie. An outline of a general model for Information Retrieval. In Proceedings of ACM SIGIR, pages 495–506, Grenoble, France, June 1988.Google Scholar
- 11.G. Salton. Automatic information organization and retrieval. McGraw Hill, New York, 1968.Google Scholar
- 13.S. Srinivasan and D. Petkovic. Phonetic confusion matrix based spoken document retrieval. In Proceedings of ACM SIGIR, pages 81–87, Athens, Greece, July 2000.Google Scholar
- 14.C.J. van Rijsbergen. Information Retrieval. Butterworths, London, UK, second edition, 1979.Google Scholar
- 16.S.K.M. Wong, W. Ziarko, V.V. Raghavan, and P.C.N. Wong. On modelling of information retrieval concepts in vector spaces. ACM Transactions on Information Systems, 12 (2): 299–321, 1987.Google Scholar
- 17.J. Xu. Solving the word mismatch problem through automatic text analysis. Ph.D. Thesis, Department of Computer Science, University of Massachusetts, Amherst, MA, USA, May 1997.Google Scholar