Abstract
This study presents an approach to the task of automatically classifying and detecting speaking styles. The detection of speaking styles is useful for the segmentation of multimedia data into consistent parts and has important applications, such as identifying speech segments to train acoustic models for speech recognition. In this work the database consists of daily news broadcasts in Portuguese television, on which two main speaking styles are evident: read speech from voice-over and anchors, and spontaneous speech from interviews and commentaries. Using a combination of phonetic and prosodic features we can separate these two speaking styles with a good accuracy (93.7% read, 69.5% spontaneous). This is performed in two steps. The first step separates the speech segments from the non-speech audio segments and the second step classifies read versus spontaneous speaking style. The use of phonetic and prosodic features provides alternative information that leads to an improvement of the classification and detection task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Labov, W.: Sociolinguistic Patterns. University of Pennsylvania Press (1973)
Goldman-Eisler, F.: Psycholinguistics: experiments in spontaneous speech. Academic Press, London (1968)
Eskenazi, M.: Trends in speaking styles research. In: EUROSPEECH 1993, PP. 501–509, Berlin (1993)
Llisterri, J.: Speaking styles in speech research. In: ELSNET/ESCA/SALT Workshop on Integrating Speech and Natural Language, Dublin, Ireland (1992)
Nakamura, M., Iwano, K., Furui, S.: Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Computer Speech and Language 22, 171–184 (2008)
Deshmukh, O.D., Kandhway, K., Verma, A., Audhkhasi, K.: Automatic evaluation of spoken English fluency. In: Proc. of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan, pp. 4829–4832 (2009)
Biadsy, F., Hirschberg, J.: Using Prosody and Phonotactics in Arabic Dialect Identification. In: Proc. of Interspeech 2009, Brighton, UK (2009)
Sanchez, M.H., Vergyri, D., Ferrer, L., Richey, C., Garcia, P., Knoth, B., Jarrold, W.: Using prosodic and spectral features in detecting depression in elderly males. In: Proc. of Interspeech, Florence, Italy, pp. 3001–3004 (2011)
Veiga, A., Candeias, S., Lopes, C., Perdigão, F.: Characterization of hesitations using acoustic models. In: Proc. of the 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong, pp. 2054–2057 (2011)
Moniz, H., Trancoso, I., Mata, A.: Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts. In: Proc. of Interspeech 2009, Brighton, UK, pp. 1719–1722 (2009)
Braga, D., Freitas, D., Teixeira, J.P., Barros, M.J., Latsh, V.: Back Close Non-Syllabic Vowel [u] Behavior in European Portuguese: Reduction or Suppression. In: Proc. of ICSP 2001 (International Conference in Speech Processing), Seoul (2001)
Candeias, S., Perdigão, F.: A realização do schwa no Português Europeu. In: Proc. of the II Workshop on Portuguese Description-JDP, 8th Symposium in Information and Human Language Technology (STIL 2011), Cuiabá, Mato Grosso, Brasil (2011)
Barbosa, P., Viana, M., Trancoso, I.: Cross-variety Rhythm Typology in Portuguese. In: Proc. of Interspeech 2009, Brighton, UK (2009)
Veiga, A., Candeias, S., Celorico, D., Proença, J., Perdigão, F.: Towards Automatic Classification of Speech Styles. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F., et al. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 421–426. Springer, Heidelberg (2012)
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: Proc. of the First International Conference on Language Resources and Evaluation (LREC), pp. 1373–1376 (1998)
Delacourt, P., Wellekens, C.J.: DISTBIC: A speaker-based segmentation for audio data indexing. Speech Communication 32, 111–126 (2000)
Boersma, P., Weenink, D.: Praat: doing phonetics by computer (Version 5.1.05), Computer program (retrieved May 1, 2009)
Platt, J.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research, MSRTR-98-14 (1998)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (2009)
Reynold, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Akbacak, M., Hansen, J.H.L.: Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems. IEEE Transactions on Audio, Speech, and Language Processing 15(2), 465–477 (2007)
Lopes, C., Veiga, A., Perdigão, F.: Using Fingerprinting to Aid Audio Segmentation. In: Proc. of the VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop, FALA 2010, Vigo (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Veiga, A., Celorico, D., Proença, J., Candeias, S., Perdigão, F. (2012). Prosodic and Phonetic Features for Speaking Styles Classification and Detection. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-35292-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)