Advertisement

Using Word Graphs as Intermediate Representation of Uttered Sentences

  • Jon Ander Gómez
  • Emilio Sanchis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7441)

Abstract

We present an algorithm for building graphs of words as an intermediate representation of uttered sentences. No language model is used. The input data for the algorithm are the pronunciation lexicon organized as a tree and the sequence of acoustic frames. The transition between consecutive units are considered as additional units.

Nodes represent discrete instants of time, arcs are labelled with words, and a confidence measure is assigned to each detected word, which is computed by using the phonetic probabilities of the subsequence of acoustic frames used for completing the word.

We evaluated the obtained word graphs by searching the path that best matches with the correct sentence and then measuring the word accuracy, i.e. the oracle word accuracy.

Keywords

word graphs word lattices lexical tree confidence measures 

References

  1. 1.
    Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Computer Speech and Language 11, 43–72 (1997)CrossRefGoogle Scholar
  2. 2.
    Ney, H., Ortmanns, S., Lindam, I.: Extensions to the word graph method for large vocabulary continuous speech recognition. In: Proceedings of IEEE ICASSP 1997, Munich, Germany, vol. 3, pp. 1791–1794 (1997)Google Scholar
  3. 3.
    Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Confidence Measures for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing 9(3), 288–298 (2001)CrossRefGoogle Scholar
  4. 4.
    Ferreiros, J., San-Segundo, R., Fernández, F., D’Haro, L.-F., Sama, V., Barra, R., Mellén, P.: New word-level and sentence-level confidence scoring using graph theory calculus and its evaluation on speech understanding. In: Proceedings of INTERSPEECH 2005, Lisbon, Portugal, pp. 3377–3380 (2005)Google Scholar
  5. 5.
    Raymond, C., Béchet, F., De Mori, R., Damnati, G.: On the use of finite state transducers for semantic interpretation. Speech Communication 48, 288–304 (2006)CrossRefGoogle Scholar
  6. 6.
    Hakkani-Tür, D., Béchet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: Using word confusion networks in spoken language understanding. Computer Speech and Language 20, 495–514 (2006)CrossRefGoogle Scholar
  7. 7.
    Justo, R., Pérez, A., Torres, M.I.: Impact of the Approaches Involved on Word-Graph Derivation from the ASR System. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds.) IbPRIA 2011. LNCS, vol. 6669, pp. 668–675. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Gómez, J.A., Calvo, M.: Improvements on Automatic Speech Segmentation at the Phonetic Level. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 557–564. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Calvo, M., Gómez, J.A., Sanchis, E., Hurtado, L.F.: An algorithm for automatic speech understanding over word graphs. Procesamiento del Lenguaje Natural (48) (accepted, pending of publication, 2012)Google Scholar
  10. 10.
    Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin Speech Database: Design of the Phonetic Corpus. In: Proceedings of Eurospeech, Berlin, Germany, vol. 1, pp. 653–656 (September 1993)Google Scholar
  11. 11.
    Benedí, J.M., Lleida, E., Varona, A., Castro, M., Galiano, I., Justo, R., López, I., Miguel, A.: Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proc. of LREC 2006, Genova, Italy (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jon Ander Gómez
    • 1
  • Emilio Sanchis
    • 1
  1. 1.Departament de Sistemes Informàtics i ComputacióUniversitat Politècnica de ValènciaSpain

Personalised recommendations