Speech Recognition Application Using Deep Learning Neural Network

  • Akzharkyn Izbassarova
  • Aziza Duisembay
  • Alex Pappachen JamesEmail author
Part of the Modeling and Optimization in Science and Technologies book series (MOST, volume 14)


Deep Neural Network (DNN) has demonstrated a great potential in speech recognition systems. This chapter presents two cases with successful implementations of speech recognition based on DNN models. The first example includes a DNN model developed by Apple for its personal assistant Siri. To detect and recognize a “Hey Siri” phrase program runs a detector based on a 5-layer network with 32 and 192 hidden units. To create an acoustic model, sigmoid and softmax activation functions are used together with a recurrent network. The second example is a region-based convolutional recurrent neural network (R-CRNN) designed by Amazon for rare sound detection in home speakers. This system is used in a security package called Alexa Guard. To allow efficient power and memory utilization while running complex machine learning algorithms special hardware is required. This chapter describes hardware solutions used in mobile phones and home speakers to process complex DNN models.


  1. 1.
    Amlogic, Ltd (2018) A113x1 development kit user guide.
  2. 2.
    Apple: September event 2018 - apple.
  3. 3.
    Apple (2018) A12 bionic the smartest, most powerful chip in a smartphone.
  4. 4.
    Eckel E (2018) Apple siri: an insider’s guide.
  5. 5.
    Girshick RB (2015) Fast R-CNN. CoRR
  6. 6.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778Google Scholar
  7. 7.
    Kao C, Wang W, Sun M, Wang C (2018) R-CRNN: region-based convolutional recurrent neural network for audio event detection, pp 1358–1362Google Scholar
  8. 8.
    Lim H, Park J, Han Y (2017) Rare sound event detection using 1D convolutional recurrent neural networks. Technical report, DCASE2017 ChallengeGoogle Scholar
  9. 9.
    Neubeck A, Van Gool L (2006) Efficient non-maximum suppression, pp 850–855.
  10. 10.
    Ren S, He K, Girshick RB, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRefGoogle Scholar
  11. 11.
    Siri Team (2017) Hey Siri: an on-device dnn-powered voice trigger for apple’s personal assistant. Machine Learning JournalGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Akzharkyn Izbassarova
    • 1
  • Aziza Duisembay
    • 1
  • Alex Pappachen James
    • 1
    Email author
  1. 1.Nazarbayev UniversityAstanaKazakhstan

Personalised recommendations