A Phonetic-Based Approach to Query-by-Example Spoken Term Detection

  • Lluís-F. Hurtado
  • Marcos Calvo
  • Jon Ander Gómez
  • Fernando García
  • Emilio Sanchis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8258)

Abstract

Query-by-Example Spoken Term Detection (QbE-STD) tasks are usually addressed by representing speech signals as a sequence of feature vectors by means of a parametrization step, and then using a pattern matching technique to find the candidate detections. In this paper, we propose a phoneme-based approach in which the acoustic frames are first converted into vectors representing the a posteriori probabilities for every phoneme. This strategy is specially useful when the language of the task is a priori known. Then, we show how this representation can be used for QbE-STD using both a Segmental Dynamic Time Warping algorithm and a graph-based method. The proposed approach has been evaluated with a QbE-STD task in Spanish, and the results show that it can be an adequate strategy for tackling this kind of problems.

Keywords

Spoken Term Detection Query-by-Example Automatic Speech Recognition 

References

  1. 1.
    Anguera, X., Macrae, R., Oliver, N.: Partial sequence matching using an unbounded dynamic time warping algorithm. In: ICASSP, pp. 3582–3585 (2010)Google Scholar
  2. 2.
    Hazen, T., Shen, W., White, C.: Query-by-example spoken term detection using phonetic posteriorgram templates. In: ASRU, pp. 421–426 (2009)Google Scholar
  3. 3.
    Zhang, Y., Glass, J.: Unsupervised spoken keyword spotting via segmental DTW on gaussian posteriorgrams. In: ASRU, pp. 398–403 (2009)Google Scholar
  4. 4.
    Akbacak, M., Vergyri, D., Stolcke, A.: Open-vocabulary spoken term detection using graphone-based hybrid recognition systems. In: ICASSP, pp. 5240–5243 (2008)Google Scholar
  5. 5.
    Fiscus, J.G., Ajot, J., Garofolo, J.S., Doddingtion, G.: Results of the 2006 spoken term detection evaluation. In: Proceedings of ACM SIGIR Workshop on Searching Spontaneous Conversational, pp. 51–55 (2007)Google Scholar
  6. 6.
    Metze, F., Barnard, E., Davel, M., Van Heerden, C., Anguera, X., Gravier, G., Rajput, N., et al.: The spoken web search task. In: Working Notes Proceedings of the MediaEval 2012 Workshop (2012)Google Scholar
  7. 7.
    Gómez, J.A., Castro, M.J.: Automatic segmentation of speech at the phonetic level. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SSPR & SPR 2002. LNCS, vol. 2396, pp. 672–680. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Gómez, J.A., Sanchis, E., Castro-Bleda, M.J.: Automatic speech segmentation based on acoustical clustering. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR & SPR 2010. LNCS, vol. 6218, pp. 540–548. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Marino, J., Nadeu, C.: Albayzin speech database: Design of the phonetic corpus. In: Third European Conference on Speech Communication and Technology (1993)Google Scholar
  10. 10.
    Park, A., Glass, J.: Towards unsupervised pattern discovery in speech. In: ASRU, pp. 53–58 (2005)Google Scholar
  11. 11.
    Kullback, S.: Information theory and statistics. Courier Dover Publications (1997)Google Scholar
  12. 12.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Lluís-F. Hurtado
    • 1
  • Marcos Calvo
    • 1
  • Jon Ander Gómez
    • 1
  • Fernando García
    • 1
  • Emilio Sanchis
    • 1
  1. 1.Departament de Sistemes Informàtics i ComputacióUniversitat Politècnica de ValènciaSpain

Personalised recommendations