Implementation of Automatic Captioning System to Enhance the Accessibility of Meetings

  • Kosei FumeEmail author
  • Taira Ashikawa
  • Nayuko Watanabe
  • Hiroshi Fujimura
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10896)


In terms of information accessibility tools for hearing-impaired people, in order to understand meetings, expectations for real-time captioning utilizing speech recognition technology are increasing, from manual handwritten abstracts. However, it is still difficult to provide automatic closed captioning with a practical level of accuracy stably, without regard to various speakers and content. Therefore, we develop a web-based real-time closed captioning system that is easy to use in contact conferences, lectures, forums, etc., through trial and feedback from hearing-impaired people in the company. In this report, we outline this system as well as the results of a simple evaluation conducted inside and outside the company.


Automatic speech recognition Captioning Information accessibility Meeting support systems 


  1. 1.
    Berke, L., Caulfield, C., Huenerfauth, M.: Deaf and hard-of-hearing perspectives on imperfect automatic speech recognition for captioning one-on-one meetings. In: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility. ASSETS 2017, New York, NY, USA, pp. 155–164. ACM (2017)Google Scholar
  2. 2.
  3. 3.
    Gaur, Y., Metze, F., Miao, Y., Bigham, J.P.: Using keyword spotting to help humans correct captioning faster. In: 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, pp. 2829–2833 (2015)Google Scholar
  4. 4.
    Huang, X., Baker, J., Reddy, R.: A historical perspective of speech recognition. Commun. ACM 57(1), 94–103 (2014)CrossRefGoogle Scholar
  5. 5.
    Kafle, S., Huenerfauth, M.: Evaluating the usability of automatically generated captions for people who are deaf or hard of hearing. In: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2017, New York, NY, USA, pp. 165–174. ACM (2017)Google Scholar
  6. 6.
    Lasecki, W.S., Miller, C.D., Kushalnagar, R., Bigham, J.P.: Real-time captioning by non-experts with legion scribe. In: Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2013, New York, NY, USA, pp. 56:1–56:2. ACM (2013)Google Scholar
  7. 7.
    Lasecki, W.S., Miller, C.D., Naim, I., Kushalnagar, R., Sadilek, A., Gildea, D., Bigham, J.P.: Scribe: deep integration of human and machine intelligence to caption speech in real time. Commun. ACM 60(9), 93–100 (2017)CrossRefGoogle Scholar
  8. 8.
    Naim, I., Gildea, D., Lasecki, W., Bigham, J.: Text alignment for real-time crowd captioning. In: North American Chapter of the Association for Computational Linguistics, NAACL 2013, pp. 201–210 (2013)Google Scholar
  9. 9.
    Nasu, Y., Fujimura, H.: Acoustic event detection and removal using LSTM-CTC for speech recognition. IEICE Tech. Rep. 116(208), 121–126 (2016). (in Japanese)Google Scholar
  10. 10.
  11. 11.
    Ranchal, R., Taber-Doughty, T., Guo, Y., Bain, K., Martin, H., Robinson, J.P., Duerstock, B.S.: Using speech recognition for real-time captioning and lecture transcription in the classroom. IEEE Trans. Learn. Technol. 6(4), 299–311 (2013)CrossRefGoogle Scholar
  12. 12.
    Shamrock Records Inc: UD Talk.

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Kosei Fume
    • 1
    Email author
  • Taira Ashikawa
    • 1
  • Nayuko Watanabe
    • 1
  • Hiroshi Fujimura
    • 1
  1. 1.Corporate Research and Development Center, Toshiba CorporationKawasakiJapan

Personalised recommendations