Abstract
Speech remediation by identifying those segments which take away from the substance of the speech content can be performed by identifying portions of speech which may be deleted without diminishing from the speech quality, but rather improving the speech. Speech remediation is important when the speech is disfluent as in the case of stuttered speech. We describe two stuttered speech remediation approaches based on the identification of those segments of speech which, when removed, would enhance speech understandability in terms of both, speech content and speech flow. We adopted two approaches, in the first approach we identify and extract speech segments that have weak semantic significance due to their low relative intensity; we subsequently trained several classifiers using a large set of inherent and derived features which provided a second layer filtering stage. The first approach was effective but required a two step process. In order to streamline the detection and remediation process, we introduced an enhancement which expands the realm of disfluency detection to include a broader range of speech anomalies by eliminating the need for a domain-dependent pre-qualification stage. The results of the new approach offer improved accuracy with enhanced simplicity, flexibility and extensibility.
Similar content being viewed by others
References
Ai, O.C., Hariharan, M., Yaacob, S., Chee, L.S. (2012). Classification of speech dysfluencies with MFCC and LPCC features, (Vol. 39 pp. 2157–2165).
Arbajian, P., Hajja, A., Raś, Z.W., Wieczorkowska, A.A. (2017). Segment-Removal based stuttered speech remediation. In International workshop on new frontiers in mining complex patterns (pp. 16–34). Cham: Springer.
Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences, vol. 17, no. 1193.
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.
Chee, L.S., Ai, O.C., Yaacob, S. (2009). Overview of automatic stuttering recognition system. In Proc. International conference on man-machine systems (pp. 1–6) no. october, Batu Ferringhi, Penang Malaysia.
Czyzewski, A., Kaczmarek, A., Kostek, B. (2003). Intelligent processing of stuttered speech. J. Intell. IN Inf. Syst., 21, 143–171.
Esmaili, I., Dabanloo, N.J., Vali, M. (2016). Automatic classification of speech dysfluencies in continuous speech based on similarity measures and morphological image processing tools. Biomedical Signal Processing and Control, 23, 104–114.
Fook, C.Y., Muthusamy, H., Chee, L.S., Yaacob, S.B., Adom, A.H.B. (2013). Comparison of speech parameterization techniques for the classification of speech disfluencies. In Turkish journal of electrical engineering & computer sciences, vol. 21, no. Sup. 1.
Hariharan, M., Chee, L.S., Ai, O.C., Yaacob, S. (2012). Classification of speech dysfluencies using LPC based parameterization techniques, (Vol. 36 pp. 1821–1830).
Honal, M., & Schultz, T. (2003). Correction of disfluencies in spontaneous speech using a noisy-channel approach. In Interspeech.
Honal, M., & Schultz, T. (2005). Automatic disfluency removal on recognized spontaneous Speech-Rapid adaptation to speaker dependent disfluencies. In ICASSP (no. 1, pp. 969–972).
Howell, P., Davis, S., Bartrip, J. (2009). The UCLASS archive of stuttered speech, (Vol. 52 pp. 556–569).
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26. https://doi.org/10.18637/jss.v028.i05.
KM, R.K., & Ganesan, S. (2011). Comparison of multidimensional MFCC feature vectors for objective assessment of stuttered disfluencies. International Journal of Advanced Networking Applications, 2(05), 854–860.
Lease, M., Johnson, M., Charniak, E. (2006). Recognizing disfluencies in conversational speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1566–1573.
Liu, Y., Shriberg, E., Stolcke, A., Harper, M.P. (2005). Comparing HMM, maximum entropy, and conditional random fields for disfluency detection. In Interspeech (pp. 3313–3316).
Raghavendra, M., & Rajeswari, P. (2016). Determination of disfluencies associated in stuttered speech using MFCC feature extraction, (Vol. 4 pp. 2321–9939).
Ravikumar, K.M., Rajagopal, R., Nagaraj, H.C. (2009). An approach for objective assessment of stuttered speech using MFCC. In The international congress for global science and technology (p. 19).
Ribeiro, M.T., Singh, S., Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM.
Świetlicka, I., Kuniszyk-Jóźkowiak, W., Smołka, E. (2013). Hierarchical ANN system for stuttering identification, (Vol. 27 pp. 228–242).
The H2O.ai team. (2015). h2o: Python Interface for H2O Python package version 3.1.0.99999, https://github.com/h2oai/h2o-3.
Winkelmann, R., & Raess, G. (2014). Introducing a web application for labeling, visualizing speech and correcting derived speech signals. In LREC (pp. 4129–4133).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Arbajian, P., Hajja, A., Raś, Z.W. et al. Effect of speech segment samples selection in stutter block detection and remediation. J Intell Inf Syst 53, 241–264 (2019). https://doi.org/10.1007/s10844-019-00546-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-019-00546-z