Improving Speech Recognizer Using Neuro-genetic Weights Connection Strategy for Spoken Query Information Retrieval

Seman, Noraini; Abu Bakar, Zainab; Jamil, Nursuriati

doi:10.1007/978-3-642-45068-6_45

Noraini Seman²⁰,
Zainab Abu Bakar²⁰ &
Nursuriati Jamil²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8281))

Included in the following conference series:

Asia Information Retrieval Symposium

1463 Accesses
2 Citations

Abstract

This paper describes the integration of speech recognizer into information retrieval (IR) system to retrieve text documents relevant to the given spoken queries. Our aim is to improve the speech recognizer since it has been proven as crucial for the front end of a Spoken Query IR system. When speech is used as the source material for indexing and retrieval, the effect of transcriber error on retrieval performance effectiveness must be considered. Thus, we proposed a dynamic weights connection strategy of artificial intelligence (AI) learning algorithms that combined genetic algorithms (GA) and neural network (NN) methods to improve the speech recognizer. Both algorithms are separate modules and were used to find the optimum weights for the hidden and output layers of a feed-forward artificial neural network (ANN) model. A mutated GA technique was proposed and compared with the standard GA technique. One hundred experiments using 50 selected words from spontaneous speeches were conducted. For evaluating speech recognition performance, we used the standard word error rate (WER) and for evaluating retrieval performance, we utilized precision and recall with respect to manual transcriptions. The proposed method yielded 95.39% recognition performance of spoken query input reducing the error rate to 4.61%. As for retrieval performance, our mutated GA+ANN model achieved a commendable 91% precision rate and 83% recall rate. It is interesting to note that the degradation in precision-recall is the same as the degradation in recognition performance of speech recognition engine. Owing to this fact, GA combined with ANN proved to attain certain advantages with sufficient accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vesnicer, B., Zibert, J., Dobrisek, S., Pavesic, N., Mihelic, F.: A Voice-driven Web Browser for Blind People. In: Eurospeech (2003)
Google Scholar
González-Ferreras, C., Cadeñoso Payo, V.: Development and Evaluation of a Spoken Dialog System to Access a Newspaper Web Site. In: Eurospeech (2005)
Google Scholar
Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M.: The TREC Spoken Document Retrieval Track: A Success Story. TREC-8 (1999)
Google Scholar
Garofolo, J.S., Voorhees, E.M., Stanford, V.M., Jones, K.S.: TREC-6 1997 spoken document retrieval track overview and results. In: Proceedings of the 6th Text REtrieval Conference (1997)
Google Scholar
Barnett, J., Anderson, S., Broglio, J., Singh, M., Hudson, R., Kuo, S.W.: Experiments in spoken queries for document retrieval. In: Proceedings of Eurospeech (1997)
Google Scholar
Crestani, F.: Word recognition errors and relevance feedback in spoken query processing. In: Proceedings of the Fourth International Conference on Flexible Query Answering Systems (2000)
Google Scholar
Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustic, Speech and Signal Processing 1975 23(1), 67–72 (1975)
Article Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustic, Speech and Signal Processing 26(1), 43–49 (1978)
Article MATH Google Scholar
Panayiota, P., Costa, N., Costantinos, S.P.: Classification capacity of a modular neural network implementing neurally inspired architecture and training rules. IEEE Transactions on Neural Networks 15(3), 597–612 (2004)
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representation by error propagation. In: Parallel Distributed Processing, Exploring the Macro Structure of Cognition. MIT Press, Cambridge (1986)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2001)
MATH Google Scholar
Goldberg, D.E.: Genetic Algorithm in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)
Google Scholar
Britannica, Encyclopedia Britannica Online (2007), http://www.britannica.com/eb/article-9050292
Seman, N., Abu Bakar, Z., Abu Bakar, N.: An Evaluation of Endpoint Detection Measures for Malay Speech Recognition of an Isolated Words. In: Proceedings of the 4th International Symposium on Information Technology (ITSim 2010), pp. 1628–1635 (2010)
Google Scholar
Seman, N.: Coalition of Genetic Algorithms and Artificial Neural Network for Isolated Spoken Malay, PhD. Thesis, Universiti Teknologi MARA (UiTM) (2012)
Google Scholar
Hornik, K.J., Stinchcombe, D., White, H.: Multilayer Feedforward Networks are Universal Approximators. Neural Networks 2(5), 359–366 (1989)
Article Google Scholar
Ghosh, R., Yearwood, J., Ghosh, M., Bagirov, A.: Hybridization of neural learning algorithms using evolutionary and discrete gradient approaches. Computer Science Journal 1(3), 387–394 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Faculty of Computer & Mathematical Sciences, MARA University of Technology (UiTM), 40450, Shah Alam, Selangor, Malaysia
Noraini Seman, Zainab Abu Bakar & Nursuriati Jamil

Authors

Noraini Seman
View author publications
You can also search for this author in PubMed Google Scholar
Zainab Abu Bakar
View author publications
You can also search for this author in PubMed Google Scholar
Nursuriati Jamil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Infocomm Research, Human Language Technology, 1 Fusionopolis Way #21-01, Connexis South, 138632, Singapore
Rafael E. Banchs , Min Zhang & Sheng Gao , &
Yahoo Labs, Avinguda Diagonal 177, 08018, Barcelona, Spain
Fabrizio Silvestri
Microsoft Research Asia, No. 5, Danling Street, Haidian District, 100080, Beijing, China
Tie-Yan Liu
Institute for Infocomm Research, Human Language Technology, 1 Fusionopolis Way #21-01, Connexis South,, 138632, Singapore
Jun Lang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seman, N., Abu Bakar, Z., Jamil, N. (2013). Improving Speech Recognizer Using Neuro-genetic Weights Connection Strategy for Spoken Query Information Retrieval. In: Banchs, R.E., Silvestri, F., Liu, TY., Zhang, M., Gao, S., Lang, J. (eds) Information Retrieval Technology. AIRS 2013. Lecture Notes in Computer Science, vol 8281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45068-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-45068-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45067-9
Online ISBN: 978-3-642-45068-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics