Development and Evaluation of Julius-Compatible Interface for Kaldi ASR

  • Yusuke YamadaEmail author
  • Takashi Nose
  • Yuya Chiba
  • Akinori Ito
  • Takahiro Shinozaki
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 82)


In recent years, the use of Kaldi has rapidly grown because it has adopted various technologies of DNN-based speech recognition in succession and has shown high recognition performance. On the other hand, the speech recognition engine, Julius, has been widely used especially in Japan. Julius is also attracting attention since DNN-HMM is implemented in it. In this paper, we describe the design plan of interfaces that make Kaldi speech recognition engine be compatible with Julius, a system overview, and the details of the speech input unit and the recognition result output unit. We also refer to the functions that we are planning to implement.


DNN-based speech recognition Kaldi Julius 



Part of this work was supported by JSPS KAKENHI Grant Number JP26280055 and JP15H02720.


  1. 1.
    The Hidden Markov Model Toolkit (HTK),
  2. 2.
    Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: Erica: the erato intelligent conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 22–29 (2016)Google Scholar
  3. 3.
    Ijima, Y., Nose, T., Tachibana, M., Kobayashi, T.: A rapid model adaptation technique for emotional speech recognition with style estimation based on multiple-regression HMM. IEICE Trans. Inf. Syst. 93(1), 107–115 (2010)CrossRefGoogle Scholar
  4. 4.
    Kawahara, T., Nanjo, H., Shinozaki, T., Furui, S.: Benchmark test for speech recognition using the corpus of spontaneous japanese. In: ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, pp. 1–4 (2003)Google Scholar
  5. 5.
    Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Proceedings of APSIPA ASC, pp. 131–137 (2009)Google Scholar
  6. 6.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: Proceedings of IEEE Workshop on Automatic Speech Recognition And Understanding (ASRU) (2011)Google Scholar
  7. 7.
    Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 215–219 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Yusuke Yamada
    • 1
    Email author
  • Takashi Nose
    • 1
  • Yuya Chiba
    • 1
  • Akinori Ito
    • 1
  • Takahiro Shinozaki
    • 2
  1. 1.Graduate School of EngineeringTohoku UniversityAoba-ku, Sendai-shiJapan
  2. 2.Interdisciplinary Graduate School of Science and EngineeringTokyo Institute of TechnologyYokohamaJapan

Personalised recommendations