Advertisement

Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications

  • Shima TabibianEmail author
  • Ahmad Akbari
  • Babak Nasersharif
Article
  • 12 Downloads

Abstract

Spoken term detection (STD) refers to discovering all occurrences of a given term in a set of speech utterances. One of the well-known approaches for the STD system is the phone lattice search (PLS) that produces a phone-based lattice of speech utterances. Since the accuracy of a phone recognizer affects the accuracy of the STD system, the PLS approach utilizes the minimum edit distance (MED) measure to compensate the phone recognizer errors. While this measure increases the detection rate, it also raises the false alarm rate. In this paper, we consider the PLS approach as the baseline. Then, we use Viterbi scoring and Jaro-Winkler similarity measure in order to decrease the false alarm rate. Since the proposed approach uses more techniques than the baseline approach, the search speed may decrease. To overcome this problem, we use lattice pruning and indexing techniques such as depth first search algorithm to increase the search speed in online and offline applications, respectively. We report the experimental results for monophone-based and triphone-based STD system. The results indicate that using triphone-based STD system improved the performance about 2% in comparison with monophone-based STD system. Moreover, when we used triphone-based models, the proposed approach including MED measure, Viterbi scores and Jaro-Winkler similarity measure improved the accuracy of the method with only MED measure, about 17%.

Keywords

Spoken term detection Phone lattice Lattice search Scoring Distance measure 

Notes

Acknowledgements

We thank Iran Telecommunication Research Center for its supports during this work.

References

  1. Allauzen, C., Mohri, M., & Saraclar, M. (2004). General indexation of weighted automata: Application to spoken utterance retrieval. In: Proceedings of the workshop on interdisciplinary approaches to speech indexing and retrieval at HLT-NAACL 2004. Association for computational linguistics (pp. 33–40).Google Scholar
  2. Audhkhasi, K., & Verma, A. (2007). Keyword search using modified minimum edit distance measure. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2007 (pp. IV-929–IV-932). IEEE.Google Scholar
  3. BenZeghiba, M. F., Gauvain, J.-L., & Lamel, L. (2010). Improved n-gram phonotactic models for language recognition. In: Eleventh annual conference of the international speech communication association (pp. 2710–2713).Google Scholar
  4. Bijankhan, M., Sheykhzadegan, J., Roohani, M. R., Zarrintare, R., Ghasemi, S. Z., & Ghasedi, M. E. (2003). Tfarsdat-the telephone Farsi speech database. In: Eighth european conference on speech communication and technology.Google Scholar
  5. Burget, L., Černocký, J., Fapšo, M., Karafiát, M., Matějka, P., Schwarz, P., Smrž, P., & Szöke, I. (2006). Indexing and search methods for spoken documents. In: International conference on text, speech and dialogue (pp. 351–358). Berlin: Springer.Google Scholar
  6. Can, D., Cooper, E., Sethy, A., White, C., Ramabhadran, B., & Saraclar, M. (2009). Effect of pronounciations on OOV queries in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing ICASSP 2009 (pp. 3957–3960).Google Scholar
  7. Can, D., & Saraclar, M. (2011). Lattice indexing for spoken term detection. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2338–2347.CrossRefGoogle Scholar
  8. Cernocky, J., Szoke, I., Fapso, M., Karafiat, M., Burget, L., Kopecky, J., Grezl, F., Schwarz, P., Glembek, O., & Oparin, I. (2007). Search in speech for public security and defense. In: IEEE Workshop on signal processing applications for public security and forensics, SAFE’07 (pp. 1–7).Google Scholar
  9. Chaudhari, U. V., & Picheny, M. (2007). Improvements in phone based audio search via constrained match with high order confusion estimates. In: IEEE Workshop on automatic speech recognition and understanding, ASRU 2007 (pp. 665–670).Google Scholar
  10. Chelba, C., & Acero, A. (2005). Position specific posterior lattices for indexing speech. In: Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 443–450).Google Scholar
  11. Goodarzi, M. M., Shekofteh, Y., Rezaei, I. S., & Kabudian, J. (2014). Discriminative confidence measure using linear combination of duration-based features and acoustic-based scores in keyword spotting. In: IEEE 7th international symposium on telecommunications (IST) (pp 316–319).Google Scholar
  12. Gracia, C., Anguera, X., Luque, J., & Artzi, I. (2014). Phoneme-lattice to phoneme-sequence matching algorithm based on dynamic programming. In: Advances in speech and language technologies for iberian languages (pp. 99–108). Cham: Springer.Google Scholar
  13. Li, W., Wu, J., & Wang, Z. A. (2008). Trellis based fast lattice generating algorithm. In: IEEE 6th international symposium on chinese spoken language processing, ISCSLP’08 (pp. 1–4).Google Scholar
  14. Mamou, J., Ramabhadran, B., & Siohan, O. (2007). Vocabulary independent spoken term detection. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, 2007 (pp. 615–622). New York: ACMGoogle Scholar
  15. Mangu, L., Soltau, H., Kuo, H.-K., Kingsbury, B., & Saon, G. (2013). Exploiting diversity for spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013 (pp. 8282–8286).Google Scholar
  16. Mansikkaniemi, A. (2010). Acoustic model and language model adaptation for a mobile dictation service. Aalto University: Master of Science.Google Scholar
  17. Masoud, A. (2017). Keyword spotting in persian speech using a hybrid model of DNN and HMM. Msc, Amirkabir University of Technology.Google Scholar
  18. Meng, S., Yu, P., Seide, F., & Liu, J. (2007). A study of lattice-based spoken term detection for Chinese spontaneous speech. In: IEEE workshop on automatic speech recognition and understanding, ASRU (pp. 635–640).Google Scholar
  19. Mertens, T., & Schneider, D. (2009). Efficient subword lattice retrieval for German spoken term detection. In: IEEE international conference on, acoustics, speech and signal processing, ICASSP 2009 (pp. 4885–4888).Google Scholar
  20. Rajabzadeh, M., Tabibian, S., Akbari, A., & Nasersharif, B. (2012a). An improved phone lattice search method for triphone based keyword spotting in online persian telephony speech. In: International conference on contemporary (CICIS) (pp. 294–299).Google Scholar
  21. Rajabzadeh, M., Tabibian, S., Akbari, A., & Nasersharif, B. (2012b). Improved dynamic match phone lattice search using viterbi scores and Jaro Winkler distance for keyword spotting system. In: IEEE 16th CSI International Symposium on, Artificial Intelligence and Signal Processing (AISP) (pp. 423–427).Google Scholar
  22. Sak, H., Saraclar, M., & Güngör, T. (2010). On-the-fly lattice rescoring for real-time automatic speech recognition. In: Eleventh annual conference of the international speech communication association.Google Scholar
  23. Saraclar, M., & Sproat, R. (2004). Lattice-based search for spoken utterance retrieval. In: Proceedings of the human language technology conference of the north american chapter of the association for computational linguistics: HLT-NAACL 2004, (pp. 129–136).Google Scholar
  24. Shao, J., Zhao, Q., Zhang, P., Liu, Z., & Yan, Y. (2008). Fast fuzzy keyword spotting using syllable confusion network indexing. Chinese Journal of Electronics, 17(2), 265–270.Google Scholar
  25. Shekofteh, Y., Kabudian, J., Goodarzi, M. M., & Rezaei, I. S. (2012). Confidence measure improvement using useful predictor features and support vector machines. In: IEEE 20th Iranian conference on electrical engineering (ICEE) (pp. 1168–1171).Google Scholar
  26. Shokri, A., Tabibian, S., Akbari, A., Nasersharif, B., & Kabudian, J. (2011). A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In: GCC conference and exhibition (GCC) (pp. 497–500).Google Scholar
  27. Siohan, O., Bacchiani, M. (2005). Fast vocabulary-independent audio search using path-based graph indexing. In: Ninth European conference on speech communication and technology (pp. 53–56).Google Scholar
  28. Szoke, I., Schwarz, P., Matejka, P., Burget, L., Karafiát, M., Fapso, M., & Cernocky, J. (2005). Comparison of keyword spotting approaches for informal continuous speech. In: Ninth European conference on speech communication and technology (pp. 633–636).Google Scholar
  29. Tabibian, S., Akbari, A., & Nasersharif, B. (2018). Discriminative keyword spotting using triphones information and N-best search. Information Sciences, 423, 157–171.CrossRefGoogle Scholar
  30. Thambiratnam, K., & Sridharan, S. (2005). Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing, proceedings (ICASSP’05) (Vol. 461, pp. I/465–I/468).Google Scholar
  31. Thambiratnam, K., & Sridharan, S. (2007). Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 346–357.CrossRefGoogle Scholar
  32. Trinh, K., Nguyen, H., Duong, D., & Vu, Q. (2008). An empirical study of multipass decoding for vietnamese LVCSR. In: Spoken languages technologies for under-resourced languages (pp. 12–17).Google Scholar
  33. Vazirnezhad, B., Almasganj, F., & Ahadi, S. M. (2009). Hybrid statistical pronunciation models designed to be trained by a medium-size corpus. Computer Speech & Language, 23(1), 1–24.CrossRefGoogle Scholar
  34. Wallace, R., Baker, B., Vogt, R., & Sridharan, S. (2009a). The effect of language models on phonetic decoding for spoken term detection. In: Proceedings of the third workshop on Searching spontaneous conversational speech (pp. 31–36). New York: ACM.Google Scholar
  35. Wallace, R., Vogt, R., & Sridharan, S. (2009b). Spoken term detection using fast phonetic decoding. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2009 (pp. 4881–4884).Google Scholar
  36. Wang, X., Xie, L., Ma, B., Chng, E. S., & Li, H. (2010). Phoneme lattice based TextTiling towards multilingual story segmentation. In: Eleventh annual conference of the international speech communication association (pp. 1305–1308).Google Scholar
  37. Winkler, W. E. (1990). String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage (pp. 354–359).Google Scholar
  38. Winkler, W. E. (2006). Overview of record linkage and current research directions. Citeseer.Google Scholar
  39. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., & Povey, D. (2002). The HTK book (Vol. 3, p. 175). Cambridge: Cambridge University Engineering Department.Google Scholar
  40. Young, S. J., Woodland, P., & Byrne, W. (1993). HTK: hidden Markov Model Toolkit V1. 5.Google Scholar
  41. Zhang, S., Shuang, Z., Shi, Q., & Qin, Y. (2010). Improved mandarin keyword spotting using confusion garbage model. In: IEEE 20th International conference on pattern recognition (ICPR) (pp. 3700–3703).Google Scholar
  42. Zhou, Z.-Y., Yu, P., Chelba, C., & Seide, F. (2006). Towards spoken-document retrieval for the internet: Lattice indexing for large-scale web-search architectures. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics (pp. 415–422).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Shima Tabibian
    • 1
    Email author
  • Ahmad Akbari
    • 2
  • Babak Nasersharif
    • 2
    • 3
  1. 1.Cyberspace Research InstituteShahid Beheshti UniversityTehranIran
  2. 2.Audio and Speech Processing Lab, Computer Engineering DepartmentIran University of Science and TechnologyTehranIran
  3. 3.Computer Engineering DepartmentK.N. Toosi University of TechnologyTehranIran

Personalised recommendations