Investigating the Recognition of Non-articulatory Sounds by Using Statistical Tests and Support Vector Machine

  • Francisco Carlos M. Souza
  • Alinne C. Corrêa Souza
  • Gilberto M. Nakamura
  • M. D. Soares
  • Patrícia Pupin Mandrá
  • Alessandra A. MacedoEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 738)


People with articulation and phonological disorders need training to plan and to execute sounds of speech. Compared to other children, children with Down Syndrome have significantly delayed speech development because they present developmental disabilities, mainly apraxia of speech. In practice, speech therapists plan and perform trainings of articulatory and non-articulatory sounds such as blow production and popping lips in order to assist speech production. Mobile applications can be integrated into the clinical treatment to transcend the boundaries of clinics and schedules and therefore reach more people at any time. The use of artificial intelligence and machine learning techniques can improve this kind of application. The aim of this pilot study is to assess speech recognition methods prioritizing the training of sounds for speech production, particularly the non-articulatory sounds. These methods apply Mel-Frequency Cepstrum Coefficients and Laplace transform to extract features, as well as traditional statistical tests and Support Vector Machine (SVM) to recognize sounds. This study also reports experimental results regarding the effectiveness of the methods on a set of 197 sounds. Overall, SVM provides higher accuracy.


Delayed speech development Speech recognition methods Machine learning Automatic speech recognition 



Authors are grateful to CNPq (442533/2016-0) and FAPESP (2016/13206-4) for the funding. We would also to thank Maria Roberta Cantarelli, Myrian Neves, Thais Moretti, Aline Camargo, and the individuals with DS by their participation.


  1. 1.
    B. Dodd, L. Thompson, Speech disorder in children with down’s syndrome. J. Intellect. Disabil. Res. 4, 308–316 (2001)Google Scholar
  2. 2.
    A. Hennequin, A. Rochet-Capellan, M. Dohen, Auditory-visual perception of VCVs produced by people with down syndrome: preliminary results, in 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), San Francisco, Sept. 2016Google Scholar
  3. 3.
    R.D. Kent, Nonspeech oral movements and oral motor disorders: a narrative review. Am. J. Speech Lang. Pathol. 24, 763–789 (2015)Google Scholar
  4. 4.
    H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, in Readings in Speech Recognition, ed. by A. Waibel, K.-F. Lee (Morgan Kaufmann Publishers Inc., San Francisco, CA, 1990), pp. 159–165. Available:
  5. 5.
    A. Alatwi, S. So, K.K. Paliwal, Perceptually motivated linear prediction cepstral features for network speech recognition, in 10th International Conference on Signal Processing and Communication Systems, ICSPCS 2016, Surfers Paradise, Gold Coast, Australia, December 19–21, 2016 (2016), pp. 1–5Google Scholar
  6. 6.
    D. Yu, L. Deng, Automatic Speech Recognition: A Deep Learning Approach (Springer, Berlin, 2014)Google Scholar
  7. 7.
    R. Courant, D. Hilbert, Methods of Mathematical Physics, vol. 1 (Interscience, New York, 1953)Google Scholar
  8. 8.
    S. Jothilakshmi, V. Ramalingam, S. Palanivel, Unsupervised speaker segmentation with residual phase and MFCC features. Expert Syst. Appl. 36(6), 9799–9804 (2009)Google Scholar
  9. 9.
    D. Kremelberg, Practical Statistics: A Quick and Easy Guide to IBM SPSS Statistics, STATA, and Other Statistical Software (Sage Publications, Inc., Thousand Oaks, 2010)Google Scholar
  10. 10.
    V.N. Vapnik, The Nature of Statistical Learning Theory (Springer, Inc., New York, 1995)Google Scholar
  11. 11.
    S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd edn. (Prentice Hall PTR, Upper Saddle River, 1998)Google Scholar
  12. 12.
    B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational Learning Theory (ACM, New York, 1992), pp. 144–152Google Scholar
  13. 13.
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  14. 14.
    C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A. Wesslén, Experimentation in Software Engineering: An Introduction, 1st edn. (Springer, Berlin, 2012)CrossRefGoogle Scholar
  15. 15.
    O. Bälter, O. Engwall, A.-M. Öster, H. Kjellström, Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction, in Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility, ser. Assets ’05 (ACM, New York, 2005), pp. 36–43Google Scholar
  16. 16.
    T.M. Kuan, Y.K. Jiar, E. Supriyanto, Language assessment and training support system (LATSS) for down syndrome children under 6 years old. WSEAS Trans. Inf. Sci. Appl. 7(8), 1058–1067 (2010)Google Scholar
  17. 17.
    V.G. Felix, L.J. Mena, R. Ostos, G. Maestre, A pilot study of the use of emerging computer technologies to improve the effectiveness of reading and writing therapies in children with down syndrome. Br. J. Educ. Technol. 48(2), 611–624 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Francisco Carlos M. Souza
    • 1
  • Alinne C. Corrêa Souza
    • 1
  • Gilberto M. Nakamura
    • 1
  • M. D. Soares
    • 2
  • Patrícia Pupin Mandrá
    • 1
  • Alessandra A. Macedo
    • 1
    Email author
  1. 1.USPRibeirão PretoBrazil
  2. 2.National Institute for Space ResearchSão José dos CamposBrazil

Personalised recommendations