Skip to main content

Prosodic and Phonetic Features for Speaking Styles Classification and Detection

  • Conference paper
Advances in Speech and Language Technologies for Iberian Languages

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

Abstract

This study presents an approach to the task of automatically classifying and detecting speaking styles. The detection of speaking styles is useful for the segmentation of multimedia data into consistent parts and has important applications, such as identifying speech segments to train acoustic models for speech recognition. In this work the database consists of daily news broadcasts in Portuguese television, on which two main speaking styles are evident: read speech from voice-over and anchors, and spontaneous speech from interviews and commentaries. Using a combination of phonetic and prosodic features we can separate these two speaking styles with a good accuracy (93.7% read, 69.5% spontaneous). This is performed in two steps. The first step separates the speech segments from the non-speech audio segments and the second step classifies read versus spontaneous speaking style. The use of phonetic and prosodic features provides alternative information that leads to an improvement of the classification and detection task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Labov, W.: Sociolinguistic Patterns. University of Pennsylvania Press (1973)

    Google Scholar 

  2. Goldman-Eisler, F.: Psycholinguistics: experiments in spontaneous speech. Academic Press, London (1968)

    Google Scholar 

  3. Eskenazi, M.: Trends in speaking styles research. In: EUROSPEECH 1993, PP. 501–509, Berlin (1993)

    Google Scholar 

  4. Llisterri, J.: Speaking styles in speech research. In: ELSNET/ESCA/SALT Workshop on Integrating Speech and Natural Language, Dublin, Ireland (1992)

    Google Scholar 

  5. Nakamura, M., Iwano, K., Furui, S.: Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Computer Speech and Language 22, 171–184 (2008)

    Article  Google Scholar 

  6. Deshmukh, O.D., Kandhway, K., Verma, A., Audhkhasi, K.: Automatic evaluation of spoken English fluency. In: Proc. of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan, pp. 4829–4832 (2009)

    Google Scholar 

  7. Biadsy, F., Hirschberg, J.: Using Prosody and Phonotactics in Arabic Dialect Identification. In: Proc. of Interspeech 2009, Brighton, UK (2009)

    Google Scholar 

  8. Sanchez, M.H., Vergyri, D., Ferrer, L., Richey, C., Garcia, P., Knoth, B., Jarrold, W.: Using prosodic and spectral features in detecting depression in elderly males. In: Proc. of Interspeech, Florence, Italy, pp. 3001–3004 (2011)

    Google Scholar 

  9. Veiga, A., Candeias, S., Lopes, C., Perdigão, F.: Characterization of hesitations using acoustic models. In: Proc. of the 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong, pp. 2054–2057 (2011)

    Google Scholar 

  10. Moniz, H., Trancoso, I., Mata, A.: Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts. In: Proc. of Interspeech 2009, Brighton, UK, pp. 1719–1722 (2009)

    Google Scholar 

  11. Braga, D., Freitas, D., Teixeira, J.P., Barros, M.J., Latsh, V.: Back Close Non-Syllabic Vowel [u] Behavior in European Portuguese: Reduction or Suppression. In: Proc. of ICSP 2001 (International Conference in Speech Processing), Seoul (2001)

    Google Scholar 

  12. Candeias, S., Perdigão, F.: A realização do schwa no Português Europeu. In: Proc. of the II Workshop on Portuguese Description-JDP, 8th Symposium in Information and Human Language Technology (STIL 2011), Cuiabá, Mato Grosso, Brasil (2011)

    Google Scholar 

  13. Barbosa, P., Viana, M., Trancoso, I.: Cross-variety Rhythm Typology in Portuguese. In: Proc. of Interspeech 2009, Brighton, UK (2009)

    Google Scholar 

  14. Veiga, A., Candeias, S., Celorico, D., Proença, J., Perdigão, F.: Towards Automatic Classification of Speech Styles. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F., et al. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 421–426. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: Proc. of the First International Conference on Language Resources and Evaluation (LREC), pp. 1373–1376 (1998)

    Google Scholar 

  16. Delacourt, P., Wellekens, C.J.: DISTBIC: A speaker-based segmentation for audio data indexing. Speech Communication 32, 111–126 (2000)

    Article  Google Scholar 

  17. Boersma, P., Weenink, D.: Praat: doing phonetics by computer (Version 5.1.05), Computer program (retrieved May 1, 2009)

    Google Scholar 

  18. Platt, J.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research, MSRTR-98-14 (1998)

    Google Scholar 

  19. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (2009)

    Google Scholar 

  20. Reynold, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)

    Article  Google Scholar 

  21. Akbacak, M., Hansen, J.H.L.: Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems. IEEE Transactions on Audio, Speech, and Language Processing 15(2), 465–477 (2007)

    Article  Google Scholar 

  22. Lopes, C., Veiga, A., Perdigão, F.: Using Fingerprinting to Aid Audio Segmentation. In: Proc. of the VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop, FALA 2010, Vigo (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Veiga, A., Celorico, D., Proença, J., Candeias, S., Perdigão, F. (2012). Prosodic and Phonetic Features for Speaking Styles Classification and Detection. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35292-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35291-1

  • Online ISBN: 978-3-642-35292-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics